ABSTRACT

A corpus (plural corpora) is a large collection of language, usually held electronically, which can be used for the purposes of linguistic analysis. The earliest known corpora were compiled by hand and consisted of biblical texts. In the modern era, an early electronically stored corpus was the Brown corpus, developed at Brown University, USA, in the early 1960s, and consisting of one million words. Other notable, more recent, corpora are the Bank of English, developed by COBUILD at Birmingham University, UK, which consists of well over 500 million words, the British National Corpus (BNC), consisting of 100 million words and the Corpus of Contemporary American English (COCA), consisting of over 425 million words and still growing.