ABSTRACT

This chapter focuses on two popular techniques, namely, K-means and Expectation maximization (EM) clustering. The K-means algorithm is easy to understand and is extremely popular. The expectation maximization (EM) algorithm is less easy and consequently less widely used, but it can give more refined results in many cases. The chapter discusses the issue of measuring cluster quality. It considers how to implement generic approach an approach, which leads us naturally to the K-means algorithm. There are several variations on the popular K-means algorithm. Two such variants are K-medoids and fuzzy K-means. The chapter presents useful measure of intrinsic cluster quality is based on a similarity matrix. Another approach to measuring cluster quality is to compute a residual sum of squares (RSS) error term, which is also known as a sum of squared errors (SSE). EM clustering algorithm is more complex, but it also allows for more variations in cluster shape, and hence should produce better results in many cases.