Big Data Cluster Analysis: A Study of Existing Techniques and Future Directions

doi:10.1201/b21822-11

ABSTRACT

Cluster analysis (clustering) is a fundamental problem in an unsupervised machine learning domain. It has a huge range of applications in various fields, including bioinformatics, gene sequencing, market basket research, medicine, social network analysis, and recommender systems. The main idea in clustering is to arrange similar data objects from a given data set in clusters (groups). Clusters usually represent some type of real world entities or a meaningful abstraction. The knowledge of the abstraction acts as a stepping stone to further analysis and is therefore a fundamental requirement for data analysis problems. With the growing capabilities in data collection, transmission, storage, and computing, there has been a steady increase in cluster analysis-based research.