ABSTRACT

This chapter introduces the most commonly used machine learning and statistical techniques that can be useful in the context of textual analysis. These include both unsupervised methods like: Wordfish, topic modelling, cluster analysis, continuous space word representation (Word2Vec) and supervised methods, such as: Wordscores, support vector machines, decision trees, random forests and different families of artificial neural networks as well as two recent methods that perform aggregated distribution estimation, i.e., ReadMe and iSA. This introduction serves two scopes: it explains how machines transform text into meaningful statistics and insights, and it also conveys the idea that human supervision is an essential step of this process whatever technique is used. To rephrase Gary King: nowadays, “social science needs to be computer-assisted, but has to be human-empowered”. This chapter is accompanied with R code to ease the understanding of the topics.