Principal Component Analysis (PCA) | 12 | Multivariate Data Integratio

ABSTRACT

Principal Component Analysis is a valuable first step in analysis to understand the main sources of variation in a single data set, which may be biological or technical in nature. This chapter describes PCA and two variants: sparse PCA to select relevant variables or features, and PCA solved with NIPALS to manage missing values. The key input arguments for PCA are outlined: whether to center and scale the data, how to choose the number of components, and the number of features to select for the sparse method. The key outputs, both graphical and numerical are then described. The multidrug case study available from mixOmics includes ABC transporters and drug compounds data that are analysed in detail with PCA, sparse PCA, and PCA with NIPALS. The numerical results along with graphical outputs are presented to aid in the interpretation of the results. The chapter concludes with further extensions and a FAQ.