ABSTRACT

This chapter presents the methodological considerations, the challenges and the opportunities, of investigating the language of fake news across cultures. The methodological considerations are based on the Fakespeak project, an interdisciplinary project involving linguists and computer scientists whose objectives are, firstly, to reveal the language and style of fake news in English, Norwegian, and Russian and, secondly, to see whether the findings can help improve existing fake news detection systems. The procedure of collecting data in the three languages representing three different cultures (e.g. English in the USA) presented several methodological challenges, which required innovative solutions. The challenges were quite different in each case, with access to quality data (collections of fake and genuine news individually labelled for veracity by experts) being especially limited for Norwegian and Russian, albeit for different reasons. The datasets that have emerged from the Fakespeak project so far are very diverse, including both large datasets with news written by a mix of authors (so-called general datasets) and smaller datasets with fake and genuine news written by the same author (single-authored and hybrid, multiauthored datasets). The diversity of data types is important for linguistic analyses of fake news so that meaningful comparisons with genuine news can be made, as well as for the subsequent development of automatic detection systems.