ABSTRACT

Our focus in Chapter 5 was on fitting regularized multivariate models to data. These models each have multiple dependent variables, which differentiates them from the univariate models that were the topic of Chapter 3. However, despite this fundamental difference in structure, the underlying concepts that we discussed in Chapters 2 and 3 applied here in Chapter 5 as well. In addition, employing regularization techniques using R also takes much the same form as what we saw in the earlier chapters as well. In short, although the models to which we apply them may differ, the basic principles of regularized estimation remained in place for the multivariate models. Furthermore, applying them in R involved many of the same command structures that were used for univariate linear models (Chapter 3) and GLiMs (Chapter 4).

In Chapter 6, we will examine another family of multivariate techniques, cluster analysis. These methods are useful for identifying potential underlying subgroups within the population, based on scores for multiple observed variables. We will describe two broad approaches to clustering, one in which we begin the analysis with hypotheses regarding the number of groups in the population (k-means) and the other where we simply agglomerate the observations from many clusters to just one and then use various statistical methods to identify the optimal solution. These methods can be regularized so that only those variables that truly contribute to identifying the subgroups are retained in the analysis. And again, although the overall framework for clustering differs from the other multivariate methods we consider in this book (thereby necessitating its own chapter), the basic principles of regularization that we have discussed heretofore will continue to prove useful in Chapter 6.