Miscellaneous: Cluster Analysis of “Big Data” Using Mixture Models

ABSTRACT

This chapter discusses issues regarding the use of mixture models for cluster analysis of big data. Big data in research in the fields of medical and health sciences is commonplace. Important information about the unknown group structures extracted from these data can contribute to improving the quality, effectiveness and cost-effectiveness of prevention, treatment and care for a sustainable health system. Big data in medical and health sciences often involve data collected from multiple sources or platforms, possibly on both individual and population levels via data linkage. Many real problems in cluster analysis of big data involve multivariate data with a mix of variable types as a result of experimental designs in which measurements from different data sources are collected. For example in medical and health sciences, feature variables are measurements on individual patients from complementary data sources such as radiology images, epidemiological and clinical data, medicare and healthcare service use data via data linkage.