ABSTRACT

Linear regression, a staple of classical statistical modeling, is one of the simplest algorithms for doing supervised learning. Though it may seem somewhat dull compared to some of the more modern statistical learning approaches, linear regression is still a useful and widely applied statistical learning method. The presence of collinearity can pose problems in the ordinary least squares, since it can be difficult to separate out the individual effects of collinear variables on the response. In fact, collinearity can cause predictor variables to appear as statistically insignificant when in fact they are significant. Linear regression models provide a very intuitive model structure as they assume a monotonic linear relationship between the predictor variables and the response. Although extensions of linear regression that integrate dimension reduction steps into the algorithm can help address some of the problems with linear regression, more advanced supervised algorithms typically provide greater flexibility and improved accuracy.