ABSTRACT

K-means clustering is one of the most commonly used clustering algorithms for partitioning observations into a set of k groups, where k is pre-specified by the analyst. K-means, like other clustering algorithms, tries to classify observations into mutually exclusive groups, such that observations within the same cluster are as similar as possible, whereas observations from different clusters are as dissimilar as possible. In k-means clustering, each cluster is represented by its center which corresponds to the mean of the observation values assigned to the cluster. The basic idea behind k-means clustering is constructing clusters so that the total within-cluster variation is minimized. The underlying assumptions of k-means require points to be closer to their own cluster center than to others. This assumption can be ineffective when the clusters have complicated geometries as k-means requires convex boundaries. K-means clustering is probably the most popular clustering algorithm and usually the first applied when solving clustering tasks.