ABSTRACT

This chapter is about the basics of the hard and fuzzy c-means models. The G-means algorithm is based on a statistical test for the hypothesis that a subset of data follows a Gaussian distribution. G-means runs hard c-means with increasing c in a hierarchical fashion until the test accepts the hypothesis that the data assigned to each c-means center are Gaussian. Drake and Hamerly introduce the idea of having a variable number of lower bounds per point on the c-means distances. Their algorithm affords a compromise between the approaches of Elkan and Hamerly. The various representations of the c-means objective functions lead to complementary facts and a deeper understanding of c-means clustering.