ABSTRACT

Multicollinearity, also termed collinearity or ill-conditioning, generally refers to a data problem that two or more predictor variables are highly correlated or linearly dependent; one of them can be nearly a linear combination of other predictor variables (cf. Belsley, Kuh, and Welsch 1980, Chapter 3). In multiple regression, this problem represents high correlations or dependency between predictor variables. From a practical standpoint, multicollinearity makes it difcult to interpret the unique inuence of a given predictor variable on the dependent variable because there exists redundant information between the predictor variable and other predictor variables (Stevens 2009, p. 74). However, a direct consequence of multicollinearity is rather computational. That is, multicollinearity renders the cross-products matrix of predictor variables (nearly) singular, so that the cross-products matrix of predictor variables cannot be inverted (i.e., the matrix inversion problem) or the calculation of the inverted matrix becomes less accurate. This in turn affects least-squares estimation, which requires the calculation of the inversed cross-products matrix. Another consequence is that the variance of a parameter estimate tends to be large. An inated variance is likely to lead to inferential error, for example, by underestimating the t statistic of its parameter estimate. It can also cause the mean square error of the estimate to be high, thereby leading the estimate to be deviated farther from the true parameter on average.