ABSTRACT

Multivariate omics data generally consists of a large number of variables but a small number of samples. Assessing how well a chosen model fits these data without overfitting is an integral part of an analysis. In mixOmics, different types of measures are used to evaluate the performance of the chosen model. This chapter first introduces performance assessment in a multivariate analysis context, as well as the concepts of subsampling via training and testing and cross-validation. Performance measures for both regression and classification techniques are described. This chapter then explains the concept of tuning to choose the optimal parameters for the chosen model, such as the number of components and the number of features to select. Once these parameters are chosen, the performance of the final model can then be assessed. As the ultimate aim of most research is to be able to generalise the chosen model to new data sets, this chapter then describes the process of predicting a continuous response or categorical outcome.