Building small specialised corpora

doi:10.4324/9780203856949-8

ABSTRACT

Writing a chapter about building audio-visual corpora is a challenge as this is an area of considerable growth in corpus linguistics, computational linguistics, behavioural sciences and language pedagogy, among others, and, by the time this chapter appears, it is likely that technological advances will have moved the field substantially further forward. The chapter focuses primarily on corpora in which the transcripts are linked to the video or audio recordings, or in which the video data have been made searchable for certain coded features. It gives an overview of what the process of building an audio-visual corpus entails, from initial conception through project design to data collection, processing and finally the development of tools and interfaces for exploitation. Constructing an audio-visual corpus involves providing the links between the transcript and the audio or video files. The final section of the chapter looks towards the future and speculates on what advances may be made in the coming years.