ABSTRACT

Chapter 4 serves to extend many of the principles that were introduced in Chapters 2 and 3. Specifically, we saw that the regularization techniques that were applied to models with normally distributed dependent variables, including the lasso, ridge, and elastic net, can be extended in a straightforward manner to cases where the response variable is categorical in nature. Thus, researchers who are working with dichotomous outcomes requiring the use of logistic regression can use penalized estimators for situations in which such techniques might be particularly helpful (e.g., high-dimensional data, collinearity). Similarly, logistic regression models for categorical variables with three or more outcomes that are either ordered or not can also be regularized. Finally, we saw in Chapter 4 that models involving count data and therefore requiring the use of the Poisson distribution, as well as those for which time until even data are being modeled (Cox regression) can also be estimated using regularization techniques including the lasso, ridge, and elastic net. In short, virtually all of the approaches outlined in Chapter 3 can be easily extended to GLiMs using the glmnet and associated libraries in R.

In the next chapter, we will turn our attention from the univariate context, in which there is a single dependent variable, to situations in which we have multivariate data; i.e. multipole dependent variables. As we will see in Chapter 5, familiar modeling techniques such as multivariate regression, canonical correlation, and discriminant analysis can all be regularized using the principles that we outlined in Chapter 2. Furthermore, these methods can be carried out quite easily in R using readily available libraries and functions.