ABSTRACT

Corpora can be gathered to represent historical data, contemporary data, or data that shows historical progression over time. Nguyen et al. argue that until recently, computational linguistics tools had been used more readily within the informational dimension of language and much less so in helping people to understand its social dimension. Predicting behavior, creating better translators, designing programs that can be used in forensic investigations, building models that can aid research, identifying idiosyncrasies in professional or research practices are all worthwhile goals for corpus studies. Twitter is very popular with language-and-society researchers because it is possible to access a large amount of data in short, self-contained segments. The datasets created for analysis can be as large or as small as needed for the goals of the project. After years and years of work and new iterations of scientific research, the biases and choices regarding how a phenomenon is written about can become invisible to those practicing in the field.