ABSTRACT

CONTENTS 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 9.2 Clustering by Mixture Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 9.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 9.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

Many unsupervised learning tasks involve high dimensional data sets where some of the attributes are continuous and some are categorical. One possible approach to clustering such data is to assume that the data to be clustered arise from a finite mixture of populations. The mixture likelihood approach has been well developed and much used, especially for mixtures where the component distributions are multivariate normal. Hunt and Jorgensen [17] presented methodology that enabled the clustering of mixed categorical and continuous data using a mixture model approach.