Canonical Correlation Analysis (CCA) | 14 | Multivariate Data Integrat

ABSTRACT

Similar to PLS, CCA is an unsupervised method that integrates two data sets of quantitative variables measured on the same samples. However, unlike PLS, CCA maximises the correlation between canonical variates, or components. In CCA, both data sets have a symmetrical role. This chapter explains the principles of CCA and its regularised version rCCA to manage large data sets with highly correlated features, and how to choose the regularisation parameters optimally. Key outputs from CCA and rCCA are described. The nutrimouse case study provided in mixOmics includes gene expression levels and lipid abundances data that are analysed with CCA and rCCA.