ABSTRACT

CONTENTS 22.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373 22.2 The Basic LC Factor Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375 22.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376 22.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383

A major goal of data mining is to extract a small number of meaningful “factors” from a larger number of variables available on a database. While traditional factor analysis (FA) offers such a data reduction capability, it is severely limited in practice because it requires all variables to be continuous, and it uses the assumption of multivariate normality to justify a linear model. In this paper, we propose a general maximum likelihood alternative to FA that does not have the above limitations. It may be used to analyze combinations of dichotomous, nominal, ordinal, and count variables and uses appropriate distributions for each scale type. The approach utilizes a framework based on latent class (LC) modeling that hypothesizes categorical as opposed to continuous factors, each of which has a small number of discrete levels. One surprising result is that exploratory LC factor models are identified while traditional exploratory FA models are not identified without imposing a rotation.