chapter  3
22 Pages

Quadratic Error and k-Means

ByBoris Mirkin

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.1 Conventional k-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.1.1 General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.1.2 Three Frameworks for the Square-Error Criterion . . . . . . . . . . . . . . . . . . . 34

3.1.2.1 Naive Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.1.2.2 Approximation Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.1.2.3 Probabilistic Modeling Framework . . . . . . . . . . . . . . . . . . . . . . . . 39

3.2 Equivalent Reformulations of k-Means Criterion . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.2.1 Anomalous Clustering Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.2.2 Inner-Product k-Means Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.2.3 Kernel-Wise Criterion of Maximization of the Semi-Averaged

Internal Similarities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.2.4 Spectral Rayleigh Quotient Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.3 Challenges for k-Means Criterion and Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.3.1 Properties of the Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.3.1.1 Convexity of Cluster Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.3.1.2 Monotonicity of the Minimum with Respect to the

Number of Clusters K . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.3.1.3 Dependence on the Feature Measurement Scales . . . . . . . . . . . . . . 43 3.3.1.4 Propensity to Balance the Sizes of Clusters . . . . . . . . . . . . . . . . . . 44 3.3.1.5 Square Error Versus Consensus . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.3.2 Different Clustering Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.3.3 Initialization of k-Means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.3.4 Getting a Deeper Minimum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.3.5 Interpretation Aids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.3.6 Feature Weighting, Three-Stage k-Means, and Minkowski Metric . . . . . . . . 49

3.4 Notes on Software for k-Means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

This chapter presents an updated review of k-means clustering, arguably the most popular clustering method. First, the square-error k-means criterion and method are introduced in three frameworks: a naive one, data recovery, and mixture of distributions. Then several

of

equivalent reformulations are given as leading to different local optimization strategies. A number of challenges and ways for addressing them are discussed as related to both the properties of solutions and ways for optimum finding. A few extensions are mentioned including fuzzy clustering and feature weighting.