ABSTRACT

The word corpus (plural corpora) originally came from Latin. According to the Oxford English Dictionary its sense of ‘body of a person’ started in the mid-fifteenth century and the sense of ‘collection of facts or things’ occurred later in 1727. The year 1956 saw an extension of the meaning to include ‘the body of written or spoken material upon which a linguistic analysis is based’. A large number of index cards used by early dictionary compilers were in fact human-readable language corpora. As Leech (1992) observed, corpora of text collection had been used by linguists and grammarians for the study of language long before the invention of the computer; therefore he suggests that ‘computer corpus linguistics’ would be a more appropriate term for studies based on language database today. The corpus in linguistics is a large collection of machine-readable texts compiled with a specific purpose that can be retrieved with particular computer software for linguistic research.