Conducting Research With Imperfect Data | 21

ABSTRACT

This chapter considers two common situations and statistical principles that can be used, and discusses the task of reviewing random subsamples when the initial distribution of data across the dependent variable is unequal. It discusses reviewing cases as part of abstraction process itself, on a ‘first pass’ through the data. This method is useful when it is strongly suspected that combination of corpus annotation and queries is insufficient for a first pass. The independent variable becomes ‘embedded versus sequential’, and the dependent variable is whether or not the additive step is performed. The method relies on the dataset being complete. The worst possible outcome is merely that the people have to read and check every instance in the dataset, reallocating as necessary. ‘The golden law of data’ says the people must be able to defend our data as sound and complete because research is only as good as the data on which it is based.