ABSTRACT

This chapter provides an overview of several unsupervised clustering techniques. As it is not necessary to study all of these techniques to obtain an overall understanding of clustering, each section is self-contained. This chapter begins by illustrating how the K-Means algorithm partitions instances into disjoint clusters. The focal point of the second section is agglomerative clustering. Cobweb’s incremental hierarchical clustering technique is next on the list. The fourth section shows how the expectation maximization (EM) algorithm uses classical statistics to perform unsupervised clustering. The final section of Chapter 11 specifies scripts written for five unsupervised learning experiments. The first experiment uses supervised learning to help interpret the meaning of the clusters obtained by applying the K-Means algorithm to gamma-ray burst data. The second experiment uses K-Means to evaluate the relevance of the input attributes part of a dataset representing females who did or did not test positive for diabetes. The third experiment portrays the hierarchical data structure produced by an agglomerative clustering of a small dataset. The fourth experiment illustrates an automated process for determining the correct number of clusters within a dataset. The fifth experiment shows how agglomerative clustering can be used with datasets containing both numeric and categorical data. Several end-of-chapter exercises are provided.