ABSTRACT

In the past 20 years or so, research on computer-mediated communication (CMC) in linguistics has examined language online from a variety of aspects. Specifically sociolinguistic issues include variation and style in digital written language, processes of innovation and change, language and social identities, multilingualism and code switching, and the relation of language, digital media, and globalization. This and other research on CMC evolves in constant interaction with the socio-technological evolution of the internet, which I divide into three broad stages: In the pre-web era – that is, until the early 1990s – CMC is largely restricted to interpersonal (dyadic or group level) exchanges carried out on applications (or modes) such as email, mailing lists, newsgroups, and Internet Relay Chat (IRC). In the early web era, from the mid-1990s to mid-2000s, the emergence of the World Wide Web introduces personal homepages, web discussion forums and corporate websites, followed by blogs. In the participatory web era, from the mid-2000s onward, people draw on the infrastructure provided by blogs, social networking sites, media-sharing sites, and wikis in order to both produce and consume web content. In the course of this development, digital media evolved from socially exclusive to almost ubiquitous in the Western world, and from a small set of options for interactive written communication to a rich repertoire of multimodal and multimedia choices. The various modes of digital communication introduced in these three “eras” accumulate in implicational ways, with each era adding on to the options offered by the previous one. These developments shape what is being viewed as typical “internet language,” what is perceived as “research-worthy,” and what counts as relevant online data. Based on an inclusive view of sociolinguistics that encompasses variationist, interactional, and discourse-oriented approaches to language in society, this chapter summarizes a range of issues related to online data collection. While it is increasingly possible to draw on compiled and annotated CMC corpora (Beißwenger & Storrer, 2008), this chapter focuses on issues related to the individual collection of original data. As it is practically impossible to neatly separate data collection from broader issues of methodology, parts of the discussion address conceptual, methodological, and analytic conditions that may affect data collection. The chapter first discusses how CMC challenges methodological assumptions in sociolinguistics and outlines data sampling criteria in the framework of

Computer-Mediated Discourse Analysis. The next two sections introduce two distinctions that impact how we approach language online: viewing CMC as “text” or “place” and collecting data “on screen” or through contact with users. Subsequent sections discuss issues related to the modes and environments being sampled, multimodality, social identities and participation roles, units and sequences of online data, and research ethics.