ABSTRACT

The goal of this chapter was to introduce the theoretical underpinnings of the various regularization techniques that we will be applying in subsequent chapters. As we will see, these estimators can be incorporated into a wide array of statistical tools, including linear models, generalized linear models, multivariate models, dimension reduction, and multilevel statistical methods, among others. Although they were developed for use in the context of regression, they have since been applied to many other types of problems. We saw that the basic family of regularization methods, including the ridge, lasso, and elastic net estimators, all have a common framework based upon penalizing model complexity. Models with more non-zero coefficients will tend to fare worse in the selection process than do models with fewer. At the same time, however, these regularization techniques do reward accurate prediction of the response variable(s), meaning that models with too few non-zero coefficients will also fare poorly. This balance of model parsimony and prediction accuracy lies at the heart of all regularization methods.

In this chapter, we also examined common extensions of the lasso, including an approach that allows variables to be grouped in coherent ways, and an adaptation that is designed to reduce estimation bias commonly associated with regularization methods. Finally, we concluded the chapter by describing a Bayesian alternative to the standard estimation approach, which allows for the incorporation of prior information about the parameters, as well as relatively easier model inference. We then discussed more directly the difficulties of model inference with regularized estimation, and a variety of approaches for dealing with these issues. We are now ready to move forward in the next chapter with a demonstration of how these methods can be applied to a standard linear model.