ABSTRACT

Consider a designed experiment in which observations for a response variable,  , to a particular factor (a categorical variable) are grouped under  predetermined or fixed factor levels (or treatments). For the models under consideration in this chapter, the continuous response variable,  , is viewed as being (potentially) dependent on the categorical explanatory variable defined by the factor. Let  = 1 2      denote treatment identifiers and let  = 1 2      identify the th observed response within the th treatment, the th

 tal number of observed responses by  and the th observed response under the th treatment by  . Assuming random deviations exist in the observed responses, let  represent the random error term corresponding to the th observed response within the th treatment. There are two ways to look at things: The treatment (or cell) means model has the appearance

 =  +   where the parameter  represents the true mean response under treatment . Alternatively, let the parameter  represent the unknown overall mean

response and denote the unknown mean treatment effect under treatment  by the parameter   . Using the decomposition  = +   in the treatment means model, the treatment effects model is given by

 = +   +   As for linear regression models, both models can be expressed in matrix form,

y = Xβ + ε and it can be shown that the design matrix for the treatment means model has full column rank. The design matrix for the treatment effects model, however, does not have full column rank. A common remedy for this issue is to make use of an appropriate restriction on the effects parameters. The unweighted mean condition imposes the restriction

P =1   = 0 and the weighted mean

condition imposes the restriction P

=1   = 0, where the weights,  , satisfy

P =1 = 1

Alternatively, a coding scheme for the factor variable can be used that results in an equivalent, adjusted model for which the design matrix does have full column rank. This approach is the default scheme used by R. Just as for linear regression models, there are analogous assumptions on

the error terms. For all  = 1 2      and  = 1 2      , it is assumed that the error terms are independent, and are identically and normally distributed with  ∼ N(0 2). If the categorical variable for the factor is denoted by , data for such an

experiment might have the appearance of the data shown in Table 11.1. The data are stored in the file Data11x01.R as a data frame named OneWayData. For future reference, when treatment sample (or cell) sizes are equal in magnitude, the experiment is referred to as having a balanced design; otherwise, the design is unbalanced . The R functions that perform the necessary computations address both designs (balanced or unbalanced) quite comfortably.