ABSTRACT

The PAM algorithm searches for representative objects in a data set formed by medoids and then assigns each object to the closest medoid in order to create clusters. Its aim is to minimize the sum of dissimilarities between the objects in a cluster and the center of the same cluster. It is known to be a robust version of k-means as it is considered to be less sensitive to outliers. Agglomerative hierarchical clustering, instead, builds clusters incrementally, producing a tree or dendrogram. This algorithm begins by assigning each sample to its own cluster. Hierarchical clustering is one powerful approach to partition clusters for identifying groups in the data set. It is often used to cluster life sciences data that often has hierarchy built-in. Hierarchical clustering has the distinct advantage that any valid measure of distance can be used.