chapter  3
Probabilistic Models for Clustering
ByHongbo Deng, Jiawei Han
Pages 26

Probabilistic model-based clustering techniques have been widely used and have shown promising results in many applications, ranging from image segmentation [71, 15], handwriting recognition [60], document clustering [36, 81], topic modeling [35, 14] to information retrieval [43]. Model-based clustering approaches attempt to optimize the fit between the observed data and some mathematical model using a probabilistic approach. Such methods are often based on the assumption that the data are generated by a mixture of underlying probability distributions. In practice, each cluster can be represented mathematically by a parametric probability distribution, such as a Gaussian or a Poisson distribution. Thus, the clustering problem is transformed into a parameter estimation problem since the entire data can be modeled by a mixture of K component distributions. 62Data points (or objects) that belong most likely to the same distribution can then easily be defined as clusters.