ABSTRACT

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384 18.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384 18.2 Problem Formulation and Mean-Shift Algorithms . . . . . . . . . . . . . . . . . . . . . . 386

18.2.1 Two Basic Types of Mean-Shift Algorithms: MS and BMS . . . . . . . . . . . 387 18.2.1.1 Clustering by Mean-Shift (MS): Find Modes . . . . . . . . . . . . . . 387 18.2.1.2 Clustering by Blurring Mean-Shift (BMS): Smooth the Data . . . 388 18.2.1.3 Similarities and Differences between MS and BMS . . . . . . . . . 388 18.2.1.4 Choice of Bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388 18.2.1.5 Choice of Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389 18.2.1.6 Hierarchical Mean-Shift and Scale-Space Clustering . . . . . . . . 389 18.2.1.7 Matrix Formulation of BMS and Generalizations of BMS . . . . . 390 18.2.1.8 Mapping New Points to Clusters (Out-of-Sample Mapping) . . 391 18.2.1.9 Advantages and Disadvantages of Mean-Shift Algorithms . . . 391

18.2.2 Origins of the Mean-Shift Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 392 18.2.3 Theoretical Results about Mean-Shift Algorithms and

Gaussian Mixtures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392 18.2.3.1 Geometry of the Modes of a Gaussian Mixture . . . . . . . . . . . . 393 18.2.3.2 Convergence of MS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394 18.2.3.3 Other Properties of MS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395 18.2.3.4 Convergence of BMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396

18.2.4 Relations with Other Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397 18.2.4.1 Spectral Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397 18.2.4.2 Bilateral Filtering and Nonlinear Diffusion . . . . . . . . . . . . . . . 397 18.2.4.3 Other Mean-Shift-Like Algorithms . . . . . . . . . . . . . . . . . . . . . 398

18.2.5 Extensions of Mean-Shift for Clustering . . . . . . . . . . . . . . . . . . . . . . . . 398 18.2.5.1 Tracking: Mode Finding over Time . . . . . . . . . . . . . . . . . . . . . 398 18.2.5.2 Manifold Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399 18.2.5.3 Graph Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399 18.2.5.4 Other Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400

18.2.6 Extensions of Mean-Shift beyond Clustering: Manifold Denoising . . . . . 400 18.2.7 Accelerated Mean-Shift Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 402

18.2.7.1 Accelerating Mean-Shift (MS) . . . . . . . . . . . . . . . . . . . . . . . . . 402 18.2.7.2 Accelerating Blurring Mean-Shift (BMS) . . . . . . . . . . . . . . . . . 404

18.3 K -Modes and Laplacian K -Modes Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 405 18.3.1 Cluster Representatives: Means, Modes, and Medoids . . . . . . . . . . . . . 406

of

18.4 Examples, Applications, Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407 18.4.1 Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407 18.4.2 Cases When Mean-Shift Clustering Does Not Work Well . . . . . . . . . . . . 408 18.4.3 Multivalued Regression and Inversion . . . . . . . . . . . . . . . . . . . . . . . . . 410 18.4.4 Other Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411

18.5 Conclusion and Open Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411 Connected-Components Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413

A natural way to characterize the cluster structure of a dataset is by finding regions containing a high density of data. This can be done in a nonparametric way with a kernel density estimate, whose modes and hence clusters can be found using mean-shift algorithms. We describe the theory and practice behind clustering based on kernel density estimates and mean-shift algorithms. We discuss the blurring and nonblurring versions of mean-shift; theoretical results about mean-shift algorithms and Gaussian mixtures; relations with scale-space theory, spectral clustering and other algorithms; extensions to tracking, to manifold and graph data, and to manifold denoising; K -modes and Laplacian K -modes algorithms; acceleration strategies for large datasets; and applications to image segmentation, manifold denoising and multivalued regression.