ABSTRACT

Twitter data can be interesting from a sociolinguistic perspective: parameters such as author identity, topic of discussion, or presence of particular grammatical/discourse features can be encoded into a data-collection algorithm at the corpus compilation stage. Tweets are delivered in a format that can be processed using a number of programming languages and software packages. The simplest way to collect data from Twitter is through the web client, which is accessible without a Twitter account. Although in the past the web client returned only tweets from the most recent seven days, it is currently possible to access tweets made since 2006 via the “Advanced Search” option. After entering the desired search parameters, results can be copy-pasted into a text editor for further processing. Collecting data in this manner is time-consuming and may make automatic processing difficult. A more sophisticated approach involves using software and/or programming scripts to access the Twitter application programming interfaces.