ABSTRACT

This book showcases the unique possibilities of corpus linguistic methodologies in engaging with and analysing language data from social media, surveying current approaches, and offering guidelines and best practices for doing language analysis.

The book provides an overview of how language in social media has been approached by linguists and non-linguists, before delving into the identification of the datasets requirements needed to pursue investigations in social media, and of the technical aspects of particular platforms that may influence the analysis, such as emoticons, retweets, and metadata. Sample Python code, along with general guidelines for using it, is provided to empower researchers to apply these techniques in their own work, supported by actual examples from three real-life case studies. Di Cristofaro highlights the full potential of using these methodologies in analysing social media language data and the ways in which they might pave the way for future applications of data analysis and processing for corpus linguistics.

The book will be key reading for researchers in corpus linguistics and linguists and social scientists interested in data-driven analysis of social media.

chapter 1|23 pages

Introduction

chapter 2|47 pages

Social media as digital research data

chapter 3|26 pages

Fundamentals of corpus linguistics

chapter 4|44 pages

Imagining the data

Corpus design

chapter 5|172 pages

Creating the data

Corpus collection

chapter 6|60 pages

Case studies

chapter 7|6 pages

Conclusion