ABSTRACT

This chapter covers the genesis and creation of a suite of tutorials and code for performing natural language processing (NLP) on non-English texts that was created at the University of Texas at Austin Libraries. It first gives an overview of the current state of digital humanities (DH) work involving the study of language data, highlighting the clear bias toward English-language analysis and corpora in DH and focusing on the role libraries and cultural heritage centers play in this work. It then highlights work being done to work toward greater linguistic and cultural inclusivity in DH and describes how the projects at the UT-Austin Libraries contribute to this work toward greater representation of non-English languages.