Bilingual and multilingual corpora: pre-processing, alignment and exploitation
This chapter begins by considering two different types of bilingual and multilingual corpora, parallel and comparable corpora, and then goes on to focus on parallel corpora alone. A brief overview of the applications of parallel corpora is provided. Issues surrounding the collection and selection of texts to be included in a parallel corpus are considered. Next, readers are shown how to pre-process texts in order to prepare them for alignment or bilingual concordancing. A discussion of what is involved in alignment, the procedure used to prepare parallel texts for linguistic analysis, follows. In the final section of the chapter, some practical examples that give some indication of how parallel corpora can help with language learning and translation are provided.