ABSTRACT

This chapter introduces corpus linguistics – a group of methods that use specialist computer programmes to study language in large bodies of machine-readable text. In this chapter we will outline the key principles and approaches of corpus linguistics, providing a discussion and demonstration of its current and future contribution to digital humanities research. The chapter begins by outlining some of the general features of corpus linguistics, including its main strengths and limitations, before reviewing a series of key debates in the field. Following this, we discuss corpus linguistics’ current contributions to research in linguistics and the digital humanities, including comparing corpus techniques against other digital methods that are currently more popular amongst humanities researchers. We then introduce a series of established corpus techniques – namely, frequency, collocation, keywords and concordance – and demonstrate their usefulness through sample analysis of a large collection – or ‘corpus’ – of online patient feedback about the National Health Service in England. We follow this case study, which sits at the interface of both digital and health humanities, with a summary of the chapter’s main arguments and discussion points, as well as by looking to the future of corpus linguistics in the digital humanities.