Interpretability of Mixed Membership Models

doi:10.1201/b17520-16

ABSTRACT

Although shared membership of individuals in two or more categories of a classification scheme is a distinguishing feature of the family of mixed membership models, relatively few analyses using these models pay much attention to this special feature. Most published analyses to-date focus on identifying and interpreting the extreme, or ideal, types consistent with a given body of data, thereby in effect using mixed membership models as crisp clustering techniques. Getting into the domain of shared membership quickly places the investigator in a difficult position, as standard estimation strategies produce a large number of ideal profiles, almost always greater than six, that represent best fitting representations of the data, while at the same time making it impossible to interpret what membership in, say, four or more profiles actually means. This conflict between statistical goodnessof-fit and subject-matter-based interpretability of shared membership cannot usually be resolved using conventional mixed membership models. We show that by introducing separate mixed membership models, each containing a small number of ideal profiles, to describe a population according to responses focused on distinct subject matter domains, and at the same time producing a vector of correlated grade of membership scores for the individuals, interpretation of shared memberships across the distinct subject matter domains becomes feasible. Deciding on what constitutes a good model requires tradeoffs between statistical goodness-of-fit criteria and frequently non-quantifiable subject-matter-based interpretation. We illustrate these unavoidable tradeoffs in several epidemiological contexts.