ABSTRACT

The chapter addresses methodological issues in compiling a spoken corpus and the pros and cons of different types of spoken corpora. The challenges have to do with the representativeness and the size of the corpus, accessibility, mark-up and transcription, annotating the data with pragmatic information and the balance between different genres.

Spoken language corpora open up new perspectives on studying linguistic variation by making it possible to consider the speech situation, society and culture. Studies in variational discourse analysis raise problems about comparability and parallel spoken corpora.

Several current corpus projects take a sociolinguistic perspective and use spoken corpora to study pragmatic phenomena, such as discourse markers, with regard to the sociolinguistic variables age, gender and class. The access to the corpora of teenage language has made it possible to study age-related phenomena.

Topics of common concern to corpus linguists and to digital humanities include language teaching, healthcare, ‘historical’ speech and culture, new spoken media, critical discourse analysis and language variation. This is illustrated by case studies using learner corpora to get a better understanding of the pragmatic competence of the learners. The pedagogical implications are wide-ranging and involve methods helping speakers to master pragmatic skills, material development and testing. The use of corpora with the purpose to provide knowledge about features characteristic of healthcare practices can help improve communication between practitioners and the public. Corpus-assisted (critical) discourse analysis brings together ideology and corpora. By identifying recurrent lexical items, phrases and collocations, it is possible to reveal practices and structures which are not open to introspection. Future developments in this area involve moving from monomodal spoken corpora into the multimodal domain.