ABSTRACT

Chinese word segmentation can be tackled from three broad perspectives: the constituent structure of segmentation units; the syntactical constructions such units are involved in; and idiosyncratic constructions which are not easily classifiable according to the criteria. Segmentation by lemmatization and dictionary look-up will miss these lexical entries and the new meanings they carry. Corpus segmentation is also conducted with reference to a basic lexicon. Hence, a premise in segmentation is that all words listed in the basic lexicon must be identified as segmentation units. All inflectional affixes are treated as segmentation units, unlike derivational affixes that are typically combined with other stems to form segmentation units. The segmentation of predicate complement compounds is always treated with rules of combination. According to the basic segmentation principle, a string that has both an independent meaning and a fixed grammatical category is one segmentation unit.