Video text detection and extraction are two crucial preprocessing steps for video text recognition. The text region is located with text detection. After that, because the superimposed text is always embedded in a complex background, text extraction is necessary to extract the text contour from the background. Finally, the real text recognition is done by an optical character recognition (OCR) system most of the time. However, text extraction is very crucial, and OCR-based recognition cannot get a satisfactory performance in the original text region with a complex background. For example, while we randomly selected 100 video text regions from news video, their recognition accuracy can be floated in the 60%–94% range by directly inputting to TH-OCR (typical Chinese character recognition software). That is, there is still

a lot of room for improvement in recognition accuracy, and video text extraction is one of the ways for that.