ABSTRACT

High-dimensional data include thousands of variables. Integrating these data sets has the potential to uncover the hierarchical and holistic mechanisms that govern biological pathways. This chapter introduces multivariate analysis for data integration, a promising and complementary alternative to classical univariate analysis, to obtain a more complete picture of a biological system. By adopting a holistic approach, we are moving away from understanding biology as a linear process involving a few genes towards understanding it as a dynamic system.

While univariate methods aim to make inferences about a population, they cannot parse the molecular interactions implicit in these data sets. Multivariate methods adopt a differ- ent paradigm: hypotheses are less causal-driven, more exploratory and often data-driven. Multivariate methods are able to manage high-dimensional and multi-omics data, but are yet to be fully developed: overfitting and multi-collinearity must be managed, and the size, heterogeneity and differing platforms inherent in high-dimensional data remains a challenge. When successful, multivariate approaches can generate novel, systems-level hypotheses that can be further validated through more traditional univariate hypotheses.