ABSTRACT

The focus of this chapter is on model based clustering (Fraley and Raftery, 2002) through finite mixture models. We shall begin by clarifying that there are now many approaches to robust finite mixture modeling. Some of them are based on using flexible components in the mixture, like Banfield and Raftery (1993), McLachlan and Peel (2000), Fru¨hwirth-Schnatter and Pyne (2010). It is common to replace the usual assumption that each mixture component follows a Gaussian distribution with one according to which one or more (or even all) follow more flexible distributions. These may have heavier tails than the Gaussian, in order to make extremes less unusual. A common choice is the multivariate T distribution. Skewed distributions instead can be used to accommodate unusual cluster shapes, and so on. Another possibility is to use one or more additional components to accommodate an outlier generating distribution of some kind. Contamination by background noise, for instance, is often formally defined using multivariate uniform distributions (often, independent in each component). A related approach is that of Fraley and Raftery (1998), who propose a Gaussian mixture with an additional component modeled as a Poisson process to handle noisy data. Flexible mixtures are very interesting but in many cases lack formal robustness properties, that is, it can be easily shown that the global and local robustness properties are very similar to those of the conventional Gaussian mixture models. An underlying difference with other robust methods is that contamination is often deemed in robust statistics as being unusual under the assumed model. Hence an “anticipated” contamination is not contamination, but simply a non-Gaussian model expected for some of the observations.