ABSTRACT

A common problem in the use of multiple regression when analysing clinical and epidemiological data is the occurrence of explanatory variables (covariates) that are not statistically independent; that is, where correlations amongst covariates are not zero (Glantz and Slinker 2001). Most textbooks emphasise that there should be no signi‰cant associations between covariates, as this gives rise to the problem known as collinearity (Slinker and Glantz 1985; Pedhazur 1997; Chatterjee et al. 2000; Glantz and Slinker 2001; Maddala 2001; Miles and Shelvin 2001). When there are more than two covariates that are correlated, this is multicollinearity. Note that the original de‰nition of collinearity and multicollinearity is that at least one covariate can be expressed as a linear combination of the others (Maddala 2001). For example, suppose the statistical model for Y regressed on p covariates X1, X2, …, and Xp is given as

Y b b X b X b X ep P= + + + + +0 1 1 2 2 ... ,

then collinearity means that at least one covariate Xi can be expressed as

X a a X a X a X a X ai i i i i= + + + + + + +− − + +0 1 1 2 2 1 1 1 1... ... p PX .