Cluster Ensembles: Theory and Applications

doi:10.1201/9781315373515-22

ABSTRACT

The demonstrated success of classifier ensembles provides a direct motivation to study effective ways of combining multiple clustering solutions as well. This chapter presents the theory, design, and application of cluster ensembles, which address the problem of combining multiple “base clusterings” of the same set of objects into a single consolidated clustering. It aims to formulate the cluster ensemble problem, and introduces different measures for comparing a pair of clustering solutions. The chapter also presents the details of different cluster ensembles algorithms followed by the applications of cluster ensembles. Since no ground truth is available for clustering problems, cluster ensemble algorithms instead aim to maximize some similarity measure between the consensus clustering and each of the base clustering solutions. Cluster ensemble methods are presented under three categories: probabilistic approaches, approaches based on co-association, and direct and other heuristic methods. Consensus clustering has been applied to microarray data to improve the quality and robustness of the resulting clusters.