ABSTRACT

This chapter explains corpora and corpus analysis tools. The corpora are used to develop various general language tools that have become commonplace in people's lives, including spell checkers, autocorrect options in text editors and web browsers, and even sophisticated machine-translation programmes. Containing many thousands or even billions of words, corpora are used in conjunction with special text-retrieval software known as concordancers. Concordancers enable one to manipulate and interrogate a corpus in a way that is very different from reading texts from start to finish, often providing insights into the language represented by the corpus which are not visible to the naked eye. Concordancers perform three basic types of operations, generating concordances, word lists and collocation statistics. However, instead of skipping from one occurrence of whatever we look up to another, the concordance option lists all such occurrences together, displaying them vertically along with the context in which they appear, as exemplified by the sample concordances for married.