ABSTRACT

Pre-processing multivariate data is an essential step before analysis to ensure the reliability of the statistical results. This chapter first details how the data should be prepared before being input into the package. Although not part of the package, data-specific normalisation to account for inherent technical biases during data generation is an important step, along with filtering non-informative variables to improve the methods’ computational time. The chapter further describes the effect of centering and scaling variables as illustrated on the linnerud data with PCA, then discusses the issue of missing values, confounders and batch effects. Finally, we describe in detail how to get started with mixOmics in the R environment to ensure the data are ready for various types of multivariate analyses.