ABSTRACT

Corpus selection is 2011 three Tibetan web site (According to the sequence arrangement): Qinghai Tibetan radio network, People’s Daily online Tibetan edition, Tibetan culture net. Extraction way: real-time acquisition in the whole year of 2011 pages in the website referred to above, get all the text collection to day to corpus information collected and content to heavy form the sample corpus of the network (news). For post processing convenience, corpus format is a pure text format and adopt Tongyuan input method to input.