ABSTRACT

This chapter introduces K-means and density-based clustering algorithms that produce nonhierarchical groups of similar data points based on the centroid and density of a cluster, respectively. A list of software packages that support these clustering algorithms is provided. The mean vector of data points in a cluster is often used as the centroid of the cluster. One method of assigning the initial centroids of the K clusters is to randomly select K data points from the data set and use these data points to set up the centroids of the K clusters. Although this method uses specific data points to set up the initial centroids of the K clusters, the K clusters have no data point in each of them initially. Density-based clustering considers data clusters as regions of data points with high density, which is measured using the number of data points within a given radius.