ABSTRACT

Hierarchical clustering is an alternative approach to k-means clustering for identifying groups in a data set. The major concepts of hierarchical clustering will be illustrated using the Ames housing data. There are multiple agglomeration methods to define clusters when performing a hierarchical cluster analysis; however, complete linkage and Ward’s method are often preferred for AGglomerative NESting clustering. Hierarchical clustering may have some benefits over k-means such as not having to pre-specify the number of clusters and the fact that it can produce a nice hierarchical illustration of the clusters. Ward’s method tends to produce clusters with roughly the same number of observations and the solutions it provides tend to be heavily distorted by outliers. Each linkage method has different systematic tendencies in the way it groups observations and can result in significantly different results. For example, the centroid method has a bias toward producing irregularly shaped clusters.