ABSTRACT

Linguistic research requires empirical evidence to give satisfactory answers to questions such as: to what extent a phenomenon X is present in the system of language? Or what is the difference between choices X and Y? Such evidence can be provided by a corpus, ‘a collection of pieces of language that are selected and ordered according to explicit linguistic criteria in order to be used as a sample of the language’ (Sinclair 1996: 2). A modern computer-based corpus comes with an interface for retrieving appropriate linguistic constructions, such as sequences of word forms, also often lemmas (that is, dictionary headwords) and generic part-of-speech (POS) tags.