From a “Bag of Names” to a “Name Index” : Using Wikipedia and Wikidata to Create an Enriched List of Person Names

doi:10.4324/9781003327677-4

ABSTRACT

Working with texts is the basis of most research in the humanities. Accordingly, Digital Humanities (DH) has extensively used Natural Language Processing methods to extract information from these texts. Named entity recognition, the location and classification of mentioned person names, organizations, locations, etc., is one of the most frequently used methods. Many times, however, this extracted list of names can be seen as a “bag of names” that is a collection of names without context.

This chapter will illustrate how an open knowledge database can add more depth to the surface of large name entities in textual corpora. We will present a way to create an index of relevant person names using the largest linked open database: Wikidata. To make our example more concrete, we will showcase our method in a case study that looks at DH theory through the names of its theorists.

We will share a short guide to reusing this approach for other applications. Our guide will include instructions on how to use the Wikipedia API, and how to create useful SPARQL queries (the language used to ask questions to Wikidata). Our proposal is that the resulting enriched index, as opposed to the bag of names, can be used to add a new layer of interpretation to other typical macro analyses like topic models and even uncover hidden bias in mentioned characters in terms of authors’ origin, gender, and generation, just to name a few possible analysis’ dimensions.