ABSTRACT

Code-switching is a widespread linguistic phenomenon that has recently received increased attention as a Natural Language Processing (NLP) problem. This survey provides an extensive overview of recent advances in textual code-switching research from the NLP perspective. We explain the theoretical background behind code-mixing, describing various linguistic approaches and metrics for calculating the amount and significance of code-switching in texts. Next, we list corpora with language alterations concerning various NLP tasks. We give an overview of NLP techniques that effectively solve different problems and recently introduced benchmarks for evaluation. Finally, we describe current problems in code-switched NLP and suggest future research directions.