ABSTRACT

In this chapter we show that corpora, particularly parallel bilingual corpora, are essential in the development of automatic machine translation (MT) systems, whether translation memories, example-based or statistical. Specific topics examined are the Europarl corpus, similarity measures for sentence matching, the Hofland sentence aligner, automatic generalisation of translation examples through paraphrasing and the discovery of templates, statistical methods of building bilingual dictionaries, the development of MT for less-resourced languages and the evaluation of MT systems.