ABSTRACT

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653 29.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653 29.2 Robustness in Cluster Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 656

29.2.1 L1 Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 656 29.2.2 Approaches Based on Trimming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 657 29.2.3 Mixture Approaches for Robust Clustering and Modeling Noise . . . . . . 661 29.2.4 Robust Fuzzy Cluster Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663 29.2.5 Robust Nonmodel-Based Cluster Analysis . . . . . . . . . . . . . . . . . . . . . . 664 29.2.6 Parameter-Based Measurement of Robustness in Cluster Analysis . . . . . 665 29.2.7 Assignment-Based and Other Robustness Measures . . . . . . . . . . . . . . . 667

29.3 Software for Robust Cluster Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 669 29.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 671 Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673

Unexpected deviations from assumed models as well as the presence of certain amounts of outlying data are common in most practical statistical applications. This fact could lead to undesirable solutions when applying nonrobust statistical techniques. This is often the case in cluster analysis, too. The search for homogeneous groups with large heterogeneity between them can be spoiled due to the lack of robustness of standard clustering methods. For instance, the presence of (even few) outlying observations may result in heterogeneous clusters artificially joined together or in the detection of spurious clusters merely made up of outlying observations. In this chapter, we will analyze the effects of different kinds of outlying data in cluster analysis and explore several alternative methodologies designed to avoid or minimize their undesirable effects.