ABSTRACT

The relative ease with which high-throughput data can be collected on the same sampling units has raised interest in relating high-dimensional data sets. This problem can be framed as a multivariate regression model with one data set treated as a response matrix and the other as a covariate matrix. The high-dimensionality of both the response and covariate matrices along with the complex correlation structures within each raises statistical challenges. Here, we review a stochastic partitioning method that addresses this problem by fitting a mixture of regression models with variable selection, thereby uncovering in a unified manner group structures and key relationships between the data sets. The problem of variable selection in this context poses added difficulties compared to standard regression settings, as the membership of objects into different components has to be learned simultaneously with the search of component-specific predictors. We emphasize the practical applicability of the method using real data and illustrate features of the associated R package SPaVS.