ABSTRACT

PLS is a multivariate methodology which relates two data matrices (e.g. transcriptomics and metabolites) by maximising the covariance between latent variables, or components. PLS is a flexible method that can manage noisy, correlated and missing variables, and can also simultaneously model several response variables. Thus, PLS can address different types of integration problems and research questions. For this reason it is the backbone of most methods in mixOmics. This chapter explains the principle of PLS for univariate response vector and multivariate response matrix. It describes the different modes: regression mode where one data set is expected to predict the response of the other, and canonical mode which models a symmetrical linear relationship between the two data sets. The tuning to choose the input arguments, including PLS mode, number of components and features to select is outlined, along with key graphical and numeric outputs. The liver.toxicity study provided in mixOmics includes gene expression levels and clinical marker data that are analysed in depth to showcase the PLS method variants available. Further PLS extensions are described and a FAQ is provided.