ABSTRACT

Partial Least Squares (PLS) is a family of methods for modeling the relationships between two sets of variables [33, 261, 263, 264]. In comparison with Principal Component Analysis (PCA) which maximizes the variance of the input data, PLS extracts latent features from data by maximizing the covariance between two blocks of variables. PLS is a popular tool for regression, classification, and dimensionality reduction [16, 142, 196], especially in the field of chemometrics. Recently, PLS has gained a lot of attention in the analysis of high-dimensional data in many fields [7, 32, 120, 202], such as medical diagnosis and bioinformatics [31, 44, 51, 52, 102, 185]. This is because PLS is resistant to overfitting and has been shown to be effective even for data with massive collinearity among the variables [175,189]. PLS can be applied to classification problems by encoding the class membership in an appropriate indicator matrix. There is a close relationship between PLS and linear discriminant analysis [16] in classification. PLS can also be applied as a dimensionality reduction tool. After relevant latent vectors are extracted, an appropriate classifier, such as support vector machines [207], can be applied for classification [198]. PLS can be extended to regression problems by treating each of the predictor and response variables as a block of variables. Furthermore, similar to principal component regression and ridge regression, PLS regression yields a shrinkage estimator [142] which produces smaller minimum square error.