ABSTRACT

Multiple linear regression represents a generalization, to more than a single explanatory variable, of the simple linear regression procedure described in Chapter 3. It is now that the relationship between a response variable and several explanatory variables becomes interesting. The adjective “multiple” indicates that at least two explanatory variables are involved in the modeling exercise. At the onset, it is important to note that the explanatory variables are strictly assumed to be fixed and under the control of the investigator, that is, they are not considered to be random variables; only the response variable is considered to be a random variable. In practice, of course, this assumption is unlikely to be true, in which case the results from a multiple linear regression are interpreted as being conditional on the observed values of the explanatory variables, and the inherent variation in the explanatory variables is ignored. Because there are no distributional assumptions about the explanatory variables, they may be nominal, categorical with more than two categories (such variables need to be coded in an appropriate way-see Exercise 4.2, and Chapters 5 and 6), ordered categorical, or interval. The goals of a multiple regression may be to determine whether the response variable and one or more explanatory variables are associated in some systematic way or to predict values of the response variables from values of the explanatory variables, or both.