Compiling a special purpose corpus
Now that you have learnt the basics about designing a special purpose corpus, we will turn our attention to issues relating to corpus compilation. Once you have mapped out the design of your ideal corpus, your next task is to identify and collect suitable texts for inclusion in that corpus. At this point you may find that you run into practical problems that make it difficult for you to actually build your ideal corpus. For example, you may not be able to find all the texts you need in electronic form. You may find the process of identifying and downloading texts from the Web or CDROMs more time consuming than expected, or you may not have copyright permission to hold certain texts in your corpus (see discussion on page 59). What this means is that you may have to be willing to make some adjustments to your ideal design. It is important to be realistic and to balance the time and effort required to construct your corpus against the benefits that you will gain by consulting it. For instance, if you are building the corpus to help you with one short assignment for class, then it is probably not a good use of your time to spend a month constructing the corpus, but if you plan to use the corpus over the duration of the academic year, then you can justify spending a longer period of time constructing it. In addition, it is important to remember that a corpus can still be a useful resource, even if it does not perfectly resemble the ideal corpus that you planned during the design stage. The most important thing is for you to be aware of any shortcomings that your corpus may have (e.g. some of the texts are a little bit old; some of the authors’ credentials are unclear) and to keep these in mind when interpreting the data.