ABSTRACT

The rapid development of digital technologies is altering data behavior, data nature, data creation, and management approaches concerning big data. Traditional technologies face challenges related to data processing, mining, and analysis due to data volume, variety, and velocity. Data clustering is a widely used big data mining technique, and each clustering algorithm processes the specific data type in the shortest amount of time. The existing clustering algorithms do not scale, speedup, nor are acceptable for big data mining, due to big data characteristics. A big data clustering strategy commonly employs divide-and-conquer, incremental learning, feature selection, granular computing, sampling, and parallel processing. A common challenge with partitional clustering algorithms is selection of the initial centroid. The K-means algorithm is one of the partitional clustering algorithms that suffer from local optima and convergence speed-related issues, due to the initial centroid. This study proposed a universal initialization model for partitional-based clustering algorithms that reduces the execution time and improves convergence speed without affecting cluster quality. This chapter addressed some K-based partitional clustering algorithms in terms of big data attributes and proposed the strata-based initialization model. The strata-based model has been implemented using the K-means algorithm, which is known as the Strata-based Initialization K-means (SIK-means) algorithm.