ABSTRACT

This chapter introduces several fundamental models and algorithms for probabilistic clustering, including mixture models, Expectation-Maximization (EM) algorithm, and probabilistic topic models. It focuses on general framework of modeling, the probabilistic explanation, the standard algorithms to learn the model, and its applications. Mixture models are probabilistic models which are increasingly used to find the clusters for univariate and multivariate data. The chapter begins the discussion of mixture models, in which the values of the discrete latent variables can be interpreted as the assignments of data points to specific components of the mixture. It uses the Gaussian mixture model to motivate the EM algorithm in an informal way, and then gives a more general view of the EM algorithm, which is a standard learning algorithm for many probabilistic models. The chapter presents two popular probabilistic topic models, i.e., probabilistic latent semantic analysis (PLSA) and latent Dirichlet allocation (LDA), for document clustering and analysis.