ABSTRACT

In this chapter, we present the analysis for data classied by a single explanatory factor. In the context of designed experiments, this would correspond to the case of a completely randomized design (Section 3.3.1) with a single treatment factor. Equivalently, this type of data might arise from an observational study in which the observations have been selected to conform to a single pre-dened classication, or grouping variable. In both cases, the only structure in the data is the treatment or grouping factor; there must be no other explanatory variables and no other structure, such as blocking or pseudo-replication, associated with the experimental material. If any such structure is present, then you should use a more complex analysis (see Chapters 7, 9 and 16 for details). In the case where only a single factor – representing treatments or groups – is present, the aim of the analysis is to discover if there are any differences in response between the factor levels. For brevity, here we use the term ‘treatments’ to cover either a set of imposed treatments or a set of observed groups. The rst step in the analysis is to write down a model for the data in terms of the unknown population mean for each treatment (Section 4.1). The principle of least squares is used to estimate these treatment means (Section 4.2). The technique of ANOVA is then used to partition the variation in the data (Section 4.3). This analysis serves several purposes: we can obtain an estimate of the background variation, which in turn is used to indicate uncertainty on estimates of treatment means; we can also obtain an estimate of the amount of variation in the data accounted for by treatment differences, and compare this with the background variation. If the variation between treatments is large compared with the background variation, then we conclude that substantive differences between treatments are present in the data. This comparison is formalized in an F-test, and differences between pairs of treatment means can be compared with the standard error of the difference (SED) to identify signicant differences between responses for different treatments (Section 4.4). There are several forms (parameterizations) of the ANOVA model for a single treatment factor, and we explain some of the different forms used in statistical software (Section 4.5).