ABSTRACT

Integrating omics data sets measured on the same N samples offers an opportunity to assess molecular interactions at multiple functional levels and provides a more comprehensive understanding of biological pathways. However, integration is challenging due to data heterogeneity, platform variability and a lack of a clear biological question. The multi-block methods in mixOmics are designed to address this complexity through dimension reduction by maximising the covariance between pairs of components, and feature selection. In this way, molecular biomarkers across different functional levels that are correlated and associated with a phenotype of interest can be identified. This chapter explains the principles of the multi-block variant of PLS-DA. It describes the key input arguments, including the design matrix which indicates which data set blocks should be connected to maximise the covariance between components, and to what extent, as well as the number of components and features to select. Key graphical and numerical outputs, and cross-validation for model assessment are also described. The breast.TCGA multi-omic study available in mixOmics is analysed. It includes mRNA, miRNA expression levels and protein abundances. The chapter presents an example of prediction in a test set, as well as other multi-block variants that might be of interest to the user for different data settings and research questions, and concludes with a FAQ.