ABSTRACT

This article considers how, and why, "Topic Modelling" tools can be used to analyse historical newspaper archives. While a growing number of media and communication studies projects have applied these techniques to corpuses of born-digital journalism, using the same tools to analyse large-scale collections of historical newspapers requires us to overcome additional technological and methodological challenges. Our discussion is framed around a historical case study examining references to the United States in the 19th Century British Library Newspaper Archive. The article begins by highlighting the problems that researchers of both digital and historical journalism face when attempting to deal with an enormous body of evidence. Next, it argues that Topic Modelling offers one potential solution to these problems by providing a way to “distant read" the archive. The remainder of the article is divided into five experiments that demonstrate how Topic Modelling can be applied to a series of research questions, each of which is applicable to other projects that might make use of newspaper archives. As well as demonstrating the investigative potential of topic modelling, the article also highlights the practical and technological barriers that currently undermine its effectiveness, particularly when it is applied to archives of historical material.