GPT-4-Trinis : Fine-Tuning an LLM on the Grammar of an Underrepresented English Variety

doi:10.4324/9781003487623-5

ABSTRACT

Efforts to make large language models (LLMs) more accessible to speakers of non-standard varieties have been a critical focus in recent research. This chapter explores how low-effort, inexpensive fine-tuning of LLMs can support high-quality translation into a non-hegemonic variety of English by focusing on the linguistic features that can be expected to pose the greatest problem for Machine Translation (MT) systems. Building on previous work, we conducted three experiments to investigate how to optimally fine-tune an LLM to translate between Trinidadian English Creole and Standard English, focusing on seven critical grammatical features. We found that GPT-4o was more accurate at translation than GPT-4. Accuracy was greatest with heterogeneous training data that included 48 prompts in each translation direction (SE to TEC and TEC to SE) per grammatical category. However, we found that adding leading context sentences was unnecessary.