ABSTRACT

At its most general, a corpus (plural: corpora) may be defined as a body or collection of linguistic data for use in scholarship and research. Since the early 1960s, interest has increasingly focused on computer corpora or machine-readable corpora, which are the main subject of this article. In the first three sections I shall begin, however, by considering the place in linguistic research of corpora in general, whether machinereadable or not. In the remaining sections I shall consider why computer corpora have been compiled or collected; what are their functions and their limitations; what are their applications, more particularly, their use in natural-language processing (NLP). This article will illustrate the field of computer corpora only by reference to corpora of Modern English.