ABSTRACT

In order to build a corpus to represent a variety of a language, we must first acknowledge the complexity of what is comprehended in the term language varieties. This chapter reviews the concept of language variety and applies its ubiquity of reference to the practical challenges of designing and building corpora. To this end, the fundamental distinction between user-related and use-related varieties is explored. The choice to build a use- or user-related corpus (or a blend of both) has an obvious impact on a corpus designer’s decision-making processes, in particular in relation to core concerns such as size, representativeness and balance. This chapter explores and discusses these concerns, invoking both well-established, more traditional corpora such as BNC1994 or the ICE suite of corpora and more modern corpora such as BNC2014, the GLOWBE project or the ENTENTEN family of corpora. The benefits of designing and building corpora which capture multiple varieties of a language – both use- and user-based – are illustrated using the 1-million-word Limerick Corpus of Irish English to explore pragmatic characteristics of Irish English. This approach complements (relatively) recent disciplinary paradigms such as variational pragmatics and the strongly emergent field of corpus pragmatics.