ABSTRACT

Bilingual corpora are built to acquire language information. Language corpora are important in machine translation[1], bilingual dictionary compiling[2], and cross-language information retrieval[3], etc. How corpora are processed is determined by their applications. Units of bilingual alignment include paragraph, sentence, phrase, and word. Various applications may choose different alignment unit. However, sentence alignment is the most commonly used.