ABSTRACT

We now turn our attention to the problem of finding groups or clusters in our data, which is an important method in EDA and data mining. We present two of the basic methods in this chapter: agglomerative clustering and k-means clustering. Another method for fuzzy clustering based on estimating a finite mixture probability density function is described in the following chapter. Most cluster methods allow users to specify a desired number of groups. We address the problem of assessing the quality of resulting clusters at the end of the chapter, where we describe several statistics and plots that will aid in the analysis, as well as in Chapter 8, where we provide some ways to graphically assess cluster output.