Phrase table filtration based on virtual context in phrase-based statistical machine translation

doi:10.1201/b15936-78

Chapter

Phrase table filtration based on virtual context in phrase-based statistical machine translation

ABSTRACT

Current state-of-the-art model in statistical machine translation (SMT) is hierarchical phrase-based model (Koehn et al., 2003; Zens et al., 2002; Koehn, 2004; Chiang, 2005; Chiang, 2007). Phrase table is the essential translation knowledge in SMT, which contains a large number of phrase pairs and each phrase pair consists of a source language phrase and a target language phrase. In construction of a phrase – based SMT system, a phrase table is automatically obtained from parallel corpus (Och, 2004) and target language model is trained by target monolingual corpus. This construction phase is also called training. In translation, a source sentence is segmented into phrase sequence firstly, and then each phrase is translated into target phrase by using the obtained phrase table. The target phrases are recombined to generate a target sentence. This translation phase is also called decoding and the decoding module is called decoder. The decoder will find the translation with the maximum probability from all candidate translations as final output.