ABSTRACT

Throughout the development of corpus linguistics there has been a noticeable focus on analyzing written language and, with some written corpora now exceeding the one-billion word mark, the possibilities for generating new insights into the way in which language is structured and used are both exciting and unprecedented. Spoken corpora, on the other hand, tend to be much smaller in size and are thus often unable to offer the same level of recurrence of individual items and phrases when compared to their written counterparts. In addition, the analysis of spoken discourse as recorded in spoken corpora requires specific attention not just to single words but also to the patterns made by words in larger units. It is to the construction, design, development and corpus analysis of such features that we now turn.