ABSTRACT

The focus of this paper is on the variation in translations produced by professional and novice translators. This variation is reflected in the translation product, more precisely in its linguistic features, e.g. preferences for certain classes of modal verbs, proportion of nominal vs. verbal phrases. These features allow us to analyse and model translation variation. Methodologically, we focus on quantitative distributions of these linguistic features reflected in the lexico-grammar of texts and their cohesion. For this, a corpus-based method involving statistical analysis is needed. For our analysis, we used a corpus of translations, where both student and professional translations have common text sources and represent, therefore, translation variants of the same texts. To investigate the variation in the given data, we used text classification – methods derived from data mining. With the help of text classification techniques, we could automatically distinguish between the texts translated by students and those translated by professionals. Moreover, this technique also enables us to learn if a certain set of features is helpful to distinguish between the two types of translations (based on the feature weights that we receive from the classification output), and helps us to assess the variable-specific features (novice vs. professionals). We used linguistic features instead of bag-of-words (often used in traditional text classification approaches) to represent translated texts. We collected information on the feature frequencies for every text in our corpus and labeled the data with the information on classes (professional or novice) to see if our corpus data support these classes. We analysed the output features of the text classification to identify those that are specific for novice translations and those that are more common for the professional ones. Besides that, we had a closer look at the misclassified cases that can be derived from the classification output as well. Our results show that automatic distinction between novice and professional translations is not an easy task, as these two translation varieties closely resemble one another. However, the results allow us to interpret the language patterns responsible for the analysed similarities and differences with the help of the theoretical frameworks and concepts underlying the feature formulation. Our study does not only convey knowledge on translation variation between novice and professional translations, but also suggests a method of exploring this variation descriptively.