ABSTRACT

This chapter discusses the basics of principal component analysis (PCA), which is a powerful machine learning technique based on methods from linear algebra. Geometrically, PCA finds the most significant dimensions in a dataset, which enables us to reduce the dimensionality of the problem, with a minimal loss of information. PCA relies on eigenvector analysis, which often utilizes the singular value decomposition (SVD). PCA can reveal structure that is not readily apparent from statistical analysis or other analytic techniques. Consequently, PCA can offer a different perspective on data, as compared to other machine learning techniques, most of which are ultimately statistical-based. The covariance matrix plays a fundamental role in PCA. PCA training consists of diagonalizing the covariance matrix C. From the resulting diagonalized matrix, we can easily determine the most significant components—the principal components. Several linear algebraic techniques can be used to diagonalize a matrix.