ABSTRACT

This chapter describes the various corpora and search tools that are used in the present study. The availability of balanced corpora has greatly facilitated the easy comparison of texts of different types. The available corpora for Chinese vary in size, format and even quality. The chapter describes the various Chinese corpora and then the online search interface CQPweb that is used for these corpora. Lancaster Corpus of Mandarin Chinese (LCMC) and several other Brown family corpora (UCLA, ZCTC) are available online at the Beijing Foreign Studies University CQPweb site, which altogether hosts 41 corpora, both Chinese and English. The corpus includes translated texts of 15 registers, making it possible to do a preliminary study of stylistic variation in translational Chinese. The Corpus of Contemporary American English (COCA) is the largest freely available corpus of English and the only large and balanced corpus of American English.