ABSTRACT

Several phases are involved in the process of turning online talk into a data corpus. First, the environment of interest should be characterized by its technological and situational features to have a clear understanding of how these features will impact the resulting online talk. This characterization will also allow study findings to be compared with previous and future studies. Both macro decisions delimiting the boundaries of the talk to be treated as data and micro decisions about sampling talk within these boundaries then need to be made. In defining the macroscopic scope of talk there are four dimensions to consider: The platform on which the talk occurs, the people involved, the topic being discussed, and the temporal period over which the talk takes place. Once the overall scope has been defined, micro-decisions come into play, including what portion of talk will be taken as the unit of analysis. Once the unit of analysis is set, decisions around sampling (the number of units to be analyzed), the time window (how the units are related to the larger structure of the data corpus), and how to treat para-data that is included in the corpus (upvotes, hashtags, analytic data, etc.) will be made. This chapter also discusses how to extract and organize online talk for analysis, and how to create a data archive for future use.