ABSTRACT

This chapter explains the methodological, institutional and technical issues in contemporary corpus linguistics. It attempts to show how much work in the area continues in the linguistic tradition of J. R. Corpus linguistics is the branch of linguistics that studies language on the basis of corpora. The texts that constitute such corpora to be stored and processed in electronic form, so that for many commentators, a corpus is by definition held on some kind of computer-storage medium and analyzable automatically or semi-automatically. The requirement that corpora consist of naturally occurring texts means that the sets of unconnected and sometimes invented sentences used in Natural Language Processing, and especially in speech processing, cannot be considered 'corpora' for current purposes; nor the kind of data elicited from native speakers by American structural linguists in the first half of the twentieth century.