ABSTRACT

Sentiment analysis is a widespread technique for computing the emotional loading of text to determine if it is “positive”, “negative” or “neutral”. Sentiment analysis has received broad adaptation in Twitter-based public opinion research, as it provides a framework for approximating support towards a concept of interest, such as a political party or candidate. However, we argue that much of the use of sentiment analysis, specifically in research aimed at forecasting election results using Twitter data, has been prone to a number of questionable design decisions which may contribute to the mixed track record of sentiment-assisted Twitter-based election forecasting. Crucially, previous publications (1) make no distinction between linguistic and political sentiment, thus potentially distorting their measurement validity; (2) typically analyze data at the tweet rather than the user level, thus biasing results towards prolific tweeter; and (3) treat relevant “negative” tweets as uninformative.

In this chapter, we provide an overview of different forms of sentiment analysis, their uses and previous applications specifically in Twitter-based election forecasting. Then, we conduct a comprehensive empirical analysis in a novel, three-fold case study approach applying the same methodologies towards predicting three separate elections in the U.S. 2016 Democratic presidential primary in New Hampshire, South Carolina and Massachusetts. Besides replicating previously applied methodologies, we expand the sentiment analysis for Twitter-based public opinion research toolkit with a method for ordinal, intensity-focused political sentiment classification and further develop a modeling approach which incorporates negative-classified information. We present 12 vote share prediction models for all three primaries.

We find that weighting Twitter data for computationally inferred user-level characteristics, such as home location and political affinity improved sentiment-based vote share prediction accuracy but find that the inclusion of negative tweets does not consistently improve analyses. Furthermore, we find that shifting analyses from the tweet to the user level benefits the resulting predictions.