ABSTRACT

Clustering is the process of organizing a set of data into groups in such a way that observations within a group are more similar to each other than they are to observations belonging to a different cluster. It is assumed that the data represent features that would allow one to distinguish one group from another. One of the most common approaches to clustering is to use a hierarchical method. This seems to be popular in the areas of data mining and gene expression analysis. Single linkage is perhaps the method used most often in agglomerative clustering, and it is the default method in the MATLAB linkage function, which produces the hierarchical clustering. The average linkage method defines the distance between clusters as the average distance from all observations in one cluster to all points in another cluster. Ward devised a method for agglomerative hierarchical clustering where merging of two clusters is determined by the size of incremental sum of squares.