Artistic Multi-Character Script Identification

doi:10.1201/9780429277573-3

ABSTRACT

Script identification is a major concern in the optical character recognition (OCR) domain. Through OCR, characters can be identified, but there is no universal OCR for all scripts since script recognition is tricky work. Until now, work has been done at the paragraph level, line level, and word level, but character level work is still apprehensive. In this chapter, a new problem in script identification is addressed, namely, that of multi-character artistic script identification. A semi-automated segmentation algorithm is used for character separation within words, followed by a thinning procedure. Structural features and a Gabor filter are used for feature extraction. The performance of this work is tested with different machine learning classifiers and very encouraging accuracy is observed. For justification of our work statistical significant test is done.