ABSTRACT

Rapid advancements in high-throughput sequencing technologies have led to an accumulation of high-dimensional biological data. It is a daunting task to infer important biological information from high-dimensional data. The important information of the data can be presented in lower and interpretable dimensions. Dimensionality reduction is a technique that decomposes high-dimensional data into lower dimensions while preserving features that are important for data analysis. Principal Component Analysis is a commonly used method for multivariate analysis and for dimensionality reduction. All principal components are orthogonal to one another and describe most of the variance in the dataset. Of all principal components, the first principal component has a maximum variance. In this chapter, we will give an introduction to PCA and provide a step-by-step guide for performing PCA on sample biological data using the PCA() class of the Scikit-Learn library.