ABSTRACT

This chapter aims to provide a large-scale collection of digital tape recordings of Cantonese speech and establishing an archive of Cantonese texts based on transcriptions of these recordings. It explains a corpus of Cantonese syllables and words together with other polysyllabic Chinese expressions. The chapter considers generation of relevant lexical information of Cantonese Chinese speech, determination of the processing and production unit of Cantonese speech, and estimation of the code-switched situation in Hong Kong. Sources of the natural Cantonese speech include dialogues of Radio call-in programs, conversations of TV programs, casual chatting among the students in canteen. The chapter also provide useful information of the pervasive code-switching situation in Hong Kong that clearly confounded the traditional language teaching methods in Hong Kong education sector.