Corpus analysis | 5 | Data basis and methodology | Sven Leuckert

ABSTRACT

This chapter presents the data basis for the study of topicalization in Asian Englishes and describes the applied methodology. The study is rooted in corpus linguistics and uses the International Corpus of English (ICE) for Hong Kong, India, the Philippines, Singapore, and Great Britain as a basis. More precisely, the sub-corpora containing the direct conversations, phone calls, and classroom lessons in ICE have been analysed for the study. Spontaneous spoken language was chosen for the analysis because topicalization in Asian Englishes has previously been identified as being a vernacular feature in these varieties. Furthermore, since topicalization can be used to shape the discourse between interlocutors, (mostly) informal spoken language was chosen. In total, approximately 1.2 million words have been read and annotated manually for criteria such as syntactic form and function, discourse function, interaction with other syntactic processes, and givenness.