ABSTRACT

In this chapter you will learn how the functions introduced in Chapter 3 can be applied to the central tasks of a corpus linguist, namely to retrieve and process linguistic data to produce frequency lists and concordances, to work with collocations, to compute dispersion statistics, and generally work with a variety of different corpora. Obviously, the number of different formats of corpora as well as the number of different tasks you might wish to perform are actually so large that we will not be able to look at all possible combinations. Rather, I will exemplify how to perform many different tasks on the basis of several frequently used corpora and corpus formats. (A few of the case studies were part of the additional assignments of the rst edition, but you will see that there are many new ones and even those that were posted before have been rewritten (and are often much faster now) and are now heavily commented.) The tasks we will perform in the case studies below are roughly grouped according to the kind of corpus-linguistic task they involve, but this grouping is only a heuristic because (1) several scripts involve tasks that defy an easy characterization into frequency lists, dispersion, collocation, and concordancing, and (2) several scripts involve more than one of these aspects; they cover a wide range of things in the hope that you can generalize from them to your own applications. Within each group the tasks or case studies are ordered according to difculty, but this, too, is only approximate because difculty is hard to operationalize objectively.