chapter  10
22 Pages

Dirichlet Process Mixtures and Nonparametric Bayesian Approaches to Clustering

ByVinayak Rao

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 10.2 The Dirichlet Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

10.2.1 The Pólya Urn Scheme and the Chinese Restaurant Process . . . . . . . . . . 199 10.2.2 The Stick-Breaking Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 10.2.3 Dirichlet Process Mixtures and Clustering . . . . . . . . . . . . . . . . . . . . . . . 202

10.3 Example Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 10.3.1 Clustering Microarray Expression Data . . . . . . . . . . . . . . . . . . . . . . . . . 203 10.3.2 Bayesian Haplotype Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 10.3.3 Spike Sorting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

10.4 Posterior Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 10.4.1 Markov Chain Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 10.4.2 Variational Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 10.4.3 Comments on Posterior Computation . . . . . . . . . . . . . . . . . . . . . . . . . . 208

10.5 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 10.5.1 Exchangeability and Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 10.5.2 Pitman-Yor Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 10.5.3 Dependent Random Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212

10.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214

Interest in nonparametric Bayesian methods has grown rapidly as statisticians and machine-learning researchers work with increasingly complex datasets, and as computation grows faster and cheaper. Central to developments in this field has been the Dirichlet process. In this chapter, we introduce the Dirichlet process and review its properties and its different representations. We describe the Dirichlet process mixture model that is fundamental to clustering problems and review some of its applications. We discuss various approaches to posterior inference and conclude with a short review of extensions beyond the Dirichlet process.