ABSTRACT

The HCRC Map Task Corpus comprises over 16 hours of spoken dialogue, recorded in acoustically controlled conditions, together with careful orthographic transcriptions. A digitally sampled form of the dialogues, the transcripts and a substantial amount of ancillary material has been published for distribution to the research and development communities on eight CD-ROMs. In this chapter we discuss the problems we faced, the lessons learned and the issues which remain for enterprises of this sort, in the hope that those who come after can benefit from our experience.