ABSTRACT

This chapter responds to the ongoing development of multilingual digital humanities as it relates to multilingual texts. Specifically, it complicates the question of what “multilingual” and “text” mean, in situations of language contact that render both terms ambiguous. In many cases of language contact, it is difficult (if not impossible) to ascribe a particular ‘word’ to any one linguistic variety in a categorical way: code-mixing between Italian and English, for example, leads to data being produced such as fensa ‘fence’, with an English nominal stem but Italian morphological inflection. The chapter then turns to look at one specific document from the past: the Dictionnaire de la langue franque of 1830. This dictionary, written by an anonymous author in Marseille, is purported to record the most comprehensive and complete lexical entries for a Mediterranean trade language used throughout the early modern period. The question of how best to represent linguistic forms in a digital space ultimately leads to further questions about the corpus itself. In particular, the chapter discusses the issues that arise when standard language models are projected onto mixed language data.