ABSTRACT

In this chapter we discuss several violations of model assumptions in OLS, and strategies for identifying and dealing with them. In particular, we consider how extreme data points can infl uence point estimates and how heteroskedasticity and multicollinearity can infl uence standard errors. Extreme data points can arise for various reasons, one of which is misspecifi cation of the form of the relationship between the predictor and outcome (e.g., as linear rather than nonlinear). Heteroskedasticity means that the assumption of constant conditional variance is violated. Multicollinearity is an extreme case of expected correlations among predictors. Indeed, some correlation among predictors is needed in order to benefi t from multiple regression’s ability to reduce bias: adding variables to a multiple regression model will not change the coeffi cient estimate of our predictor of interest unless the new variables are correlated at least to some degree with our predictor. At the same time, correlation among the predictors increases the standard errors of the coeffi cients. Very high correlation among predictor variables, or multicollinearity, will make our coeffi cient estimates quite imprecise.