ABSTRACT

This chapter is about clustering, i.e. unsupervised learning. Like classification, discussed in Chapter 11, clustering methods focus on sub-groups of a data set. But instead of using known group labels-, the goal is to discover reasonable groups. Well-known methods, such as k-means and hierarchical clustering are contrasted and compared. For the latter, various distance methods and linkage functions are considered. Both two-dimensional and high-dimensional examples show strong differences in behavior of linkage functions across settings. Use of data visualizations to find clusters is also considered. An overview of hybrid methods, such as fuzzy clustering, and integration of clustering with other statistical tasks is also given.