ABSTRACT

The literature on Odia optical character recognition (OCR) has shown notable gains made in designing models using only 47–51 classes of Odia allographs. However, these models do not consider the compound characters that constitute about 70% of the entire Odia character system. While the frequency of occurrence of compound characters in a script is relatively less compared to basic characters, they should be taken into account for developing a robust OCR system. There are more than 400 characters that exist in the Odia language, considering both basic and compound characters, amongst which 211 classes of characters are most commonly used. The existing single-stage pattern-recognition-based models fail to recognize these classes of characters effectively. Therefore, in this chapter, a hybrid OCR model is designed to effectively recognize these character classes. The proposed model works in three different stages. In the first stage, the structural similarity index along with template matching is used to predict 20 possible classes similar to a given testing sample. The projection matching is thereafter performed to reduce the number of possible classes by a half and, finally, the actual class label is predicted using local frequency descriptor features and a general regression neural network. The proposed model is evaluated on a private dataset comprising 52,750 images from 211 classes and it achieves an overall accuracy of 90.6%. The comparative analyses exhibit the effectiveness of the proposed scheme over state-of-the-art methods.