ABSTRACT
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497 22.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498
22.1.1 Improved Quality of Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499 22.1.2 Robust Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499 22.1.3 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499 22.1.4 Knowledge Re-Use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499 22.1.5 Multiview Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499 22.1.6 Distributed Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 500
22.2 The Cluster Ensemble Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 500 22.3 Measuring Similarity between Clustering Solutions . . . . . . . . . . . . . . . . . . . . . 501 22.4 Cluster Ensemble Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502
22.4.1 Probabilistic Approaches to Cluster Ensembles . . . . . . . . . . . . . . . . . . . 502 22.4.1.1 A Mixture Model for Cluster Ensembles . . . . . . . . . . . . . . . . . 502 22.4.1.2 Bayesian Cluster Ensembles . . . . . . . . . . . . . . . . . . . . . . . . . . 503 22.4.1.3 Nonparametric Bayesian Cluster Ensembles . . . . . . . . . . . . . . 503
22.4.2 Pairwise Similarity-Based Approaches . . . . . . . . . . . . . . . . . . . . . . . . . 505 22.4.2.1 Methods Based on the Ensemble Co-Association Matrix . . . . . 505 22.4.2.2 Relating Consensus Clustering to Other Optimization
Formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507 22.4.3 Direct Approaches Using Cluster Labels . . . . . . . . . . . . . . . . . . . . . . . . 508
22.4.3.1 Graph Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508 22.4.3.2 Cumulative Voting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508
22.5 Combination of Classifier and Clustering Ensembles . . . . . . . . . . . . . . . . . . . . 509 22.6 Applications of Consensus Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511
22.6.1 Gene Expression Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512 22.6.2 Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512
22.7 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514 Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514
This chapter describes the problem of combining multiple partitionings of a set of objects into a single consolidated clustering without accessing the features or algorithms that
of
determine these partitionings-popularly known as the problem of “consensus clustering.” We illustrate different algorithms for solving the consensus clustering problem. The notion of dissimilarity between a pair of clustering solutions plays a key role in designing any cluster ensemble algorithm and a summary of such dissimilarity measures is also provided. We also cover recent efforts on combining classifier and clustering ensembles, leading to new approaches for semisupervised learning and transfer learning. Finally, we describe several applications of consensus clustering.