ABSTRACT

Traditional clustering algorithms such as k-means and hierarchical clustering are heuristic-based algorithms that derive clusters directly based on the data rather than incorporating a measure of probability or uncertainty to the cluster assignments. Model-based clustering provides the added benefit of automatically identifying the optimal number of clusters. This chapter examines Gaussian mixture models, which are one of the most popular model-based clustering approaches available. The key idea behind model-based clustering is that the data are considered as coming from a mixture of underlying probability distributions. The covariance matrix in Equation describes the geometry of the clusters; namely, the volume, shape, and orientation of the clusters. Model-based clustering techniques do have their limitations. The methods require an underlying model for the data, and the cluster results are heavily dependent on the assumption. Classical model-based clustering show disappointing computational performance in high-dimensional spaces.