Resampling Methods for Exploring Cluster Stability

doi:10.1201/b19706-37

ABSTRACT

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637 28.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637 28.2 Resampling Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639

28.2.1 Prediction Strength . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 641 28.2.2 Cluster-Wise Assessment of Stability . . . . . . . . . . . . . . . . . . . . . . . . . . 642 28.2.3 Bootstrapping Average Cluster Stability . . . . . . . . . . . . . . . . . . . . . . . . 642 28.2.4 Further Methods for Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643 28.2.5 Related Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643

28.3 Examples, Applications, Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643 28.3.1 Artificial Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644 28.3.2 Guest Survey Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 648

28.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 650 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651

Model diagnostic for cluster analysis is still a developing field because of its exploratory nature. Numerous indices have been proposed in the literature to evaluate goodness-of-fit, but no clear winner that works in all situations has been found yet. Derivation of (asymptotic) distribution properties is not possible in most cases. Resampling schemes provide an elegant framework to computationally derive the distribution of interesting quantities describing the quality of a partition. Important building blocks are criteria to compare partitions as introduced in the previous chapter. Special emphasis will be given to stability of a partition, that is, given a new sample from the same population, how likely is it to obtain a similar clustering?