ABSTRACT

Introduction Misprint detection for Japanese sentences is considered to be a difficult visual search task. One reason is that Japanese lexical units like words and phrases, which usually consist of several characters, are visually ambiguous because there are no spaces between them. Moreover, about 3000 ideographic (Kanji) characters are commonly used. Of these, there are many sets of similar characters. Figure 29.1 shows examples of a Japanese sentence and similar character pairs. By using Japanese characters and sentences, this paper analyses two important

372 K. Yokosawa and M. Shimomura

factors, character similarity and sentence segmentation, that might determine the efficiency of misprint detection.