ABSTRACT

This chapter discusses several different approaches to clustering, their properties, advantages and disadvantages. It examines how to evaluate clustering solutions and also discusses the application of clustering algorithms to the protein space. Partitional clustering algorithms partition a finite set of objects into clusters so as to optimize a certain criterion function. The term hierarchical clustering is used to describe techniques for obtaining a sequence of nested clusters. The k-means algorithm is a simple and popular clustering algorithm that partitions the data so as to minimize the squared error function. Graph clustering algorithms are of great interest when the data is given as a proximity matrix containing pairwise similarities or dissimilarities. The steps of a pairwise clustering algorithms are equivalent to operations on the graph, and the resulting clusters are equivalent to sets of nodes in the graph.