ABSTRACT

The result of a chromatographic experiment can be often arranged as a multivariate dataset. The linear supervised techniques are used when there is a need to obtain a model (an equation) allowing a prediction of a given property from such data. In the simplest way, the modeling problem can be solved by the classical linear regression, called simple linear regression (SLR) or ordinary least squares (OLS) regression. Although there are more equations than unknowns and one can yield one compromise solution, this method suffers from common multicollinearity in chemometric datasets. An overfitted model is a model that fits to the calibration data almost without an error, but the modeled dependence is without any sense. A careful validation of the model is absolutely necessary in chemometrics to ensure both lack of overfitting or a lack of fit and to estimate the real predictive ability.