ABSTRACT

This chapter describes a method derived from first principles that makes no assumptions about why cases may not be independent. It observes whether they are distributed as if they were independent, and factor this observation into our model of variability. The chapter measures the variance of observed proportions between text subsamples using two different models: one that assumes each text is a random sample, and another that examines the distribution of actual subsample scores. The confidence interval width has increased by a further third. This new sample size might be thought of as a hypothetical random sample supporting the observation, based on our actual corpus sample and its known distribution per text. There is a flaw in this method, which becomes more serious with small samples. The ‘Binomial’ per-text distribution is really the sum of multiple Binomial distributions, one for each sample size.