Paragraph reconstruction for postscript documents with complex layout

doi:10.1201/b18660-58

ABSTRACT

Recently, electronic publishing has become an important branch in publishing industry. Many software applications could convert the traditional paper-based magazines into digital magazines publications. A widely used software OCR could convert the scanning or photographing images into editable texts. However, the paragraph structure lost in OCR reduces the interaction capability for the reconstructed document. This is especially dicult for people with visual impairment who read electronic documents with the help of a screen reader. Without paragraph structure, the screen reader can only output all the words in a document sequentially, which reduce their eciency in accessing specific information in a document and limit possible interaction with the information in a document.