ABSTRACT

In some sense at least, this book is an introduction to corpus linguistics. If you are a little familiar with the eld, this probably immediately triggers the question “Why yet another introduction to corpus linguistics?” This is a valid question because, given the upsurge of studies using corpus data in linguistics, there are also already quite a few very good introductions available. Do we really need another one? Predictably, I think the answer is still “yes” and “yes, even a second edition,” and the reason is that this introduction is radically different from every other introduction to corpus linguistics out there. For example, there are a lot of things that are regularly dealt with at length in introductions to corpus linguistics that I will not talk about much:

• the history of corpus linguistics: Kaeding, Fries, early 1m word corpora, up to the contemporary giga corpora and the still lively web-as-corpus discussion;

• how to compile corpora: size, sampling, balancedness, representativity; • how to create corpus markup and annotation: lemmatization, tagging, parsing; • kinds and examples of corpora: synchronic vs. diachronic, annotated vs. unannotated; • what kinds of corpus-linguistic research have been done.