ABSTRACT

In this chapter, we aim at bringing together techniques from computer sciences and social science methodology to efficiently find subgroups of university students that show unusual developmental trajectories in dropout intentions. In the data mining literature, efficient algorithms for subgroup discovery, a pattern mining task, are widely used for identifying subgroups with specific exceptional properties in large data sets. In the social sciences on the other hand, there are decades of research on how to model structural relations between random variables. One of the most popular and flexible methods used in this field is structural equation modeling (SEM). SEM can, for example, be used to model complex change and growth processes using (latent) growth curve models (Meredith & Tisak, 1990), growth component models (Mayer et al., 2012) or latent change models (McArdle, 2009). Some recent extensions of growth models, like growth mixture modeling (GMM, Muthén & Muthén, 2000) and structural equation model trees (SEM trees, Brandmaier et al., 2013), allow for discovering potentially latent subgroups that are distinct with regard to their growth pattern. We use an alternative to GMM and SEM trees that has similar aims but builds on algorithms from subgroup discovery and exceptional model mining (Klösgen, 1996; Leman et al., 2008) to more efficiently find the subgroups of interest in large data sets. This approach is termed subgroup latent growth curve modeling (SubgroupLGCM) and is similar to a recently applied approach to find subgroups in mediation models (Lemmerich et al., 2020). A key difference between SubgroupLGCM and GMM is that the former uses manifest descriptions of patterns whereas the latter builds on a latent class approach to identify unobserved subgroups. To the best of our knowledge, there is little to no overlap between SEM and pattern mining fields and we show that SEM can benefit from the algorithmic knowledge developed in other fields. Within the SEM literature there are both global (e.g., χ 2 - tests of model fit) and local tests (e.g., Wald tests) that can serve as interestingness measures for quantifying differences between subgroups in the algorithm, but other measures such as user-defined effect sizes can be used as well. In an illustrative example, we show how our SubgroupLGCM algorithm can be applied in the social sciences. We use data from the National Educational Panel Study in Germany, which provides longitudinal data on the development of competencies, educational processes, and educational decisions. We investigate the trajectories of university students study dropout intentions over four years and use the SubgroupLGCM algorithm to explore subgroups with exceptional trajectories. Limitations and other potential applications for computational social science are discussed.