ABSTRACT

Optical character recognition (OCR) is a blanket term used to identify a number of processes by which written or typed text is electronically translated into machine-readable text. The technology is “optical” in that it, similar to human readers, scans the document for characters that are relevant to some output goal. Some applications are inherently mechanical and literally look at the document with a high-speed lens; others assess the image of a document after it has been scanned by another device. OCR has been present in the business and governmental world for more than 60 years, but is a recent development in the realm of academic and public libraries. Now used by thousands of universities and institutions for capturing searchable and editable text, OCR devices and software are essential in the rapidly developing world of digital libraries and textual preservation. Through the advent of OCR, academic institutions can scan the pages of a book, recognize its text, load the images online, and allow users to instantly search for keywords or proper names without having to manually “flip” through each page of the work. The research value of such a technology is likely to be profound: permitting text comparisons and analyses never before possible.