ABSTRACT

This chapter focuses on lexis as a core topic of concern for the digital study of language. In particular, we discuss the key issue of what lexical items represent in digital humanities (DH) research. Very little work in DH is interested in lexical items for their own sake – instead, lexis is used as a proxy for measuring cultural significance, an author’s style and how much attention is being paid to a concept within a text. We therefore organise our ‘Critical issues and topics’ section of the chapter around three themes: what lexis can tell us about the language (as noted), what is key in the study of lexis (including frequency, usage, collocation, semantic prosody and metaphorical extension) and what problems we face when using lexical items in English language research (such as polysemy, homonymy and spelling variation). In the ‘Current contributions and research’ section, using the perspective of degrees of ‘curation’ of data, we overview the sources of information about lexis – dictionaries and thesauri such as the OED, the Historical Thesaurus of English, Wordnet, and the other major dictionaries of English – followed by sources of lexical data, primarily corpora and finally major tools for the study of lexis, focusing on semantic tagging software, lemmatisers and spelling normalisers.

Finally, we demonstrate major research techniques in this area (from lexis to corpora, from corpora to lexis or both through the perspective of a connected semantic field) by a study of lexis in a category of the Historical Thesaurus (nouns in 03.12.15 Money), showing its variation in terms of semantic prosody, its evolution, its internal structure, its metaphorical extensions to other types of lexis and how its evidence of use shapes our understanding of the field.