Corpus Linguistics

doi:10.4324/9781315852515-19

ABSTRACT

Corpus linguistics (henceforth “CL”) is the software-based, quantitative investigation of a collection of electronic texts; such a collection is referred to as a “corpus” — a body of texts which is usually compiled in a principled manner. There has never been a time when so many English-language data have been readily available for investigation. The worldwide web contains many billions of words of English usage and is increasingly being trawled for corpus construction. Advances in computational memory and search software mean that gigantic corpora consisting of billions of words, derived from the web and elsewhere, can readily be stored and explored swiftly. With these technological developments, linguists in the twenty-first century are in an exciting position to investigate English use on a massive scale. It is no exaggeration to claim that the use of corpora is revolutionizing English language description.