ABSTRACT

This chapter explores two areas of corpus research in which the parallelism between actual primary text and an idealized representation of the text differ substantially: non-native language and historical corpora. In the first case, we use two case studies to discuss the relationship between learner texts and corrected or revised versions of those texts. In the first study, we look at data from the multilayer Falko learner corpus of German, which contains target hypotheses created by native speakers. Using these hypotheses we compare actual and expected behavior in the production of nominal compounds in L2 German. A second study looks at textual revisions in multiple aligned essay drafts from learners of English responding to annotated corrective feedback. We examine the types of tutor feedback that lead to changes and the extent of error reduction in each round of the revision process. The second part of the chapter looks at two historical corpora which represent different forms of original artifact orthography versus normalized or modernized forms to facilitate research. In the first case, a corpus of ancient Coptic manuscripts is examined, showing the usefulness of multiple representations of a text in recognizing scribal errors and identifying manuscript fragments, as well as exploring questions in historical linguistics, such as the penetration of Greek loanwords into Coptic. In the final case study, a multilayer corpus of early German scientific texts is analyzed to describe the gradual process of orthogtaphic stabilization and find quantitatively notable characteristics in each century of data from the corpus.