ABSTRACT

A large number of machine learning approaches can be applied to twitter text. This chapter shows some of the automated content analysis approaches for analyzing large-scale Twitter data, namely sentiment analysis, network text analysis, topic modeling, and machine-learning-driven text classification. It argues the strengths and weaknesses of these methods. The chapter discusses briefly the strategies that are used to collect Twitter data, followed by the steps necessary to preprocess and prepare the data for automated content analysis. With billions of users and hundreds of millions of posts and tweets per day, social media's big data have attracted the attention of the social sciences. Social network analysis is interested in social actors and their relationships as well as topological structures emerging from these relationships. The chapter describes the text preprocessing steps that are applied for most automated content analysis procedures.