ABSTRACT

Lexicographers and linguists have long hoped for corpus evidence about spoken language, but the practical difficulties of transcribing sufficiently large quantities of text have until recently prevented the construction of a spoken corpus of over one million words. Now, as part of the British National Corpus (BNC) project, 1 Longman have produced an orthographically transcribed corpus of 10 million words covering a wide range of speech variation. A large proportion of the spoken corpus – approximately five million words – comprises spontaneous conversational English. The importance of conversational dialogue to linguistic study is unquestionable: it is the dominant component of general language both in terms of language reception and language production. Gathering this amount of conversational English has been made possible by employing a unique method of data collection, which is briefly described below (for more information see Crowdy 1993). In addition, this chapter considers (a) the methods used to process the 1200 hours of recordings and (b) the transcription scheme employed.