ABSTRACT

This chapter discusses one of the popular unsupervised learning algorithms the k-means clustering. K-means clustering follows a simple iterative rule to classify an unlabeled dataset, or more specifically, to form clusters. The center of the clusters is found by calculating the means of the coordinates or features of the data points. Data points that are nearer to the randomly placed centroids are assigned to a cluster, such that the sum of the squared distance between the data points and the cluster’s centroid (i.e. the arithmetic mean of all of the data points that belong to that cluster) is the minimum. To generate cluster data, sklearn has a library labeled “datasets”. However, the area and the structure of the original clusters and k-means predicted clusters are mostly similar with minute errors in the overlapping regions. Unsupervised clustering has several applications in gene expression analysis.