ABSTRACT

This chapter describes techniques used in statistical machine translation (SMT) for languages similar to Polish, and discusses results in SMT for Polish, as well their improvement. It examines methods used for comparable corpora exploration and explores native Yalign method on a set of selected classifiers. Competitions accelerate the development of statistical machine translation systems. One of the most popular competitions is the annual meeting organized by the Defense Advanced Research Projects Agency. Another annual competition is the International Workshop on Spoken Language Translation. Two main approaches for building comparable corpora can be distinguished. Perhaps the most common approach is based on the retrieval of cross-lingual information. In the second approach, source documents must be translated using a machine translation system. The baseline Polish SMT system advanced from almost last place, and the progress bar indicates that the system could be compared to systems for mainstream languages such as German.