ABSTRACT

The clustvarsel and selvarclust approaches have been used for the Gaussian parsimonious clustering models family of models. The variable selection for clustering and classification (VSCC) approach has been developed and used in the same situation. The VSCC technique finds a subset of variables that simultaneously minimizes the within-group variance and maximizes the between-group variance, thereby resulting in variables that show separation between the desired groups. In general, calculation of the residual variance is needed; however, if the data are standardized to have equal variance across variables, then any variable minimizing the within-group variance also maximizes the leftover variance. Accordingly, Jeffrey Andrews and Paul D. McNicholas describe the VSCC method in terms of variables that are standardized to have zero mean and unit variance. C. Bouveyron and C. Brunet–Saumard provide an excellent review of work on model-based clustering of high-dimensional data.