Challenges in Transcription, Part I – Conventions
In corpus linguistics, one of the most important steps in spoken corpus construction is transcription. Without transcription that accurately represents what was recorded, it is hard to validate the findings of subsequent linguistic research on the corpus. This chapter discusses the transcription of the Spoken British National Corpus 2014 (BNC2014), starting with justification of the decision not to automate transcription but use a traditional manual transcription approach. Then, the original British National Corpus (BNC) transcription scheme is evaluated, revealing several weaknesses which were improved upon in the compilation of the Spoken BNC2014. Following this is a description of the main features of the Spoken BNC2014 transcription scheme, including the transcription of overlaps, filled pauses and minimal use of punctuation. Transcription was conducted by a team of transcribers at Cambridge University Press; the chapter concludes by discussing the rigorous audio-checking and proofreading procedures adopted to ensure high quality transcription.