ABSTRACT

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497 22.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498

22.1.1 Improved Quality of Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499 22.1.2 Robust Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499 22.1.3 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499 22.1.4 Knowledge Re-Use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499 22.1.5 Multiview Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499 22.1.6 Distributed Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 500

22.2 The Cluster Ensemble Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 500 22.3 Measuring Similarity between Clustering Solutions . . . . . . . . . . . . . . . . . . . . . 501 22.4 Cluster Ensemble Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502

22.4.1 Probabilistic Approaches to Cluster Ensembles . . . . . . . . . . . . . . . . . . . 502 22.4.1.1 A Mixture Model for Cluster Ensembles . . . . . . . . . . . . . . . . . 502 22.4.1.2 Bayesian Cluster Ensembles . . . . . . . . . . . . . . . . . . . . . . . . . . 503 22.4.1.3 Nonparametric Bayesian Cluster Ensembles . . . . . . . . . . . . . . 503

22.4.2 Pairwise Similarity-Based Approaches . . . . . . . . . . . . . . . . . . . . . . . . . 505 22.4.2.1 Methods Based on the Ensemble Co-Association Matrix . . . . . 505 22.4.2.2 Relating Consensus Clustering to Other Optimization

Formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507 22.4.3 Direct Approaches Using Cluster Labels . . . . . . . . . . . . . . . . . . . . . . . . 508

22.4.3.1 Graph Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508 22.4.3.2 Cumulative Voting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508

22.5 Combination of Classifier and Clustering Ensembles . . . . . . . . . . . . . . . . . . . . 509 22.6 Applications of Consensus Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511

22.6.1 Gene Expression Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512 22.6.2 Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512

22.7 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514 Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514

This chapter describes the problem of combining multiple partitionings of a set of objects into a single consolidated clustering without accessing the features or algorithms that

of

determine these partitionings-popularly known as the problem of “consensus clustering.” We illustrate different algorithms for solving the consensus clustering problem. The notion of dissimilarity between a pair of clustering solutions plays a key role in designing any cluster ensemble algorithm and a summary of such dissimilarity measures is also provided. We also cover recent efforts on combining classifier and clustering ensembles, leading to new approaches for semisupervised learning and transfer learning. Finally, we describe several applications of consensus clustering.