ABSTRACT

In this paper we present a simple but effective approach to segment and to perform word tagging on Hanzi-based languages. The essence of the approach is based on a hybrid algorithm, which combines the learning capability with the flexibility of a digital processing system. Our ideas was implemented by a system with 72 tags and a corpus with 111, 068 Chinese words (173, 497 Chinese characters). Our closed test results reached 99.5%segmentation accuracy and 95.3% tagging accuracy. The opened test results for the specified domain is 91.6% segmentation accuracy and 89.6% tagging accuracy. Various applications of this system are also discussed in this paper.