ABSTRACT

Mixed membership models have emerged over the past 20 years as a flexible cluster-like modeling tool for unsupervised analyses of high-dimensional multivariate data where the assumption that an observational unit belongs to a single cluster, or principal component, is violated. Instead, one assumes that every unit partially belongs to all clusters, according to an individual membership vector. Mixed membership models were introduced essentially independently in a number of different statistical application settings: (1) survey data (Berkman et al., 1989; Erosheva, 2002; Erosheva et al., 2007), (2) population genetics (Pritchard et al., 2000b; Rosenberg et al., 2002), (3) text analysis (Blei et al., 2003; Erosheva et al., 2004; Airoldi et al., 2010), and then later on in (4) image processing and annotation (Barnard et al., 2003; Fei-Fei and Perona, 2005), and (5) molecular biology (Segal et al., 2005; Airoldi et al., 2006; 2007; 2013).