ABSTRACT

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 7.1 Similarity-Based Clustering: Definitions and Criteria . . . . . . . . . . . . . . . . . . . . . 125

7.1.1 What Is Similarity-Based Clustering? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 7.1.2 Similarity-Based Clustering and Cuts in Graphs . . . . . . . . . . . . . . . . . . . 126 7.1.3 The Laplacian and Other Matrices of Spectral Clustering . . . . . . . . . . . . . 127 7.1.4 Four Bird’s Eye Views of Spectral Clustering . . . . . . . . . . . . . . . . . . . . . . 128

7.2 Spectral Clustering Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 7.3 Understanding the Spectral Clustering Algorithms . . . . . . . . . . . . . . . . . . . . . . 132

7.3.1 RandomWalk/Markov Chain View . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 7.3.2 Spectral Clustering as Finding a Small Balanced Cut in G . . . . . . . . . . . . . 134 7.3.3 Spectral Clustering as Finding Smooth Embeddings . . . . . . . . . . . . . . . . 136

7.4 Where Do the Similarities Come From? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 7.5 Practical Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 7.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

Spectral clustering is a family of methods to find K clusters using the eigenvectors of a matrix. Typically, this matrix is derived from a set of pairwise similarities Si j between the points to be clustered. This task is called similarity-based clustering, graph clustering, or clustering of diadic data. One remarkable advantage of spectral clustering is its ability to cluster “points” which are not necessarily vectors, and to use for this a “similarity,” which is less restrictive than a distance. A second advantage of spectral clustering is its flexibility; it can find clusters of arbitrary shapes, under realistic separations. This chapter introduces the similarity-based clustering paradigm, describes the algorithms used, and sets the foundations for understanding these algorithms. Practical aspects, such as obtaining the similarities, are also discussed.