Investigating learner language development with electronic longitudinal corpora: Theoretical and methodological issues
Recent theoretical developments in SLA leave us with a complex and sophisticated research agenda. Taking the example of just one framework, questions about the initial state of second language learners, as well as about their ultimate attainment, are giving rise to much research. We need to be able to study development as it takes place within individual learners, in order to be able to test current hypotheses. We can only say that development has stopped, or that it is taking place, by studying learners over long periods of time. Using cross-sectional data is not helpful, as we have no way of knowing whether or not the linguistic representations we are comparing across learners are in fact similar. Using longitudinal data is problematic, however. Datasets are very expensive to collect, and it can be extremely diﬃcult practically to study learners over long periods of time. Because of this, it is now imperative that, given the enormous resource and logistical implications of collecting good-quality longitudinal corpora, we make them widely available to the research community. Diﬀerent researchers can then work on them from diﬀerent points of view. Additionally, for datasets to be truly representative of learner development, they must be of suﬃcient size. Finally, but very importantly, the research community must agree on conventions for transcribing, storing, and analyzing such data, and must make use of available software in order to analyze it. The SLA research community is far behind other ﬁelds in taking advantage of computerized tools in order to assist it in furthering its research agenda. This chapter will discuss the theoretical reasons underlying the need to collect and share good-quality longitudinal learner corpora. It will then explore the methodological issues arising, before presenting a large database of French learner language oral corpora, freely available to the research community on the internet. The possibilities oﬀered by the computerized analysis of corpora will also be demonstrated through a range of examples.