ABSTRACT

This chapter discusses Principal Component Analysis (PCA) and Singular Value Decomposition (SVD) as ways reducing the number of features in our dataset by generating combinations of the given attributes with the purpose of projecting into a lower dimensional space. Dimensionality reduction is in a way a form of feature extraction. The learning objective of dimensionality reduction is the use of data in the most meaningful basis possible. The dimensionality reduction comes into place when representing our data using only those principal components that provide the highest contributions to the variance of the dataset. The dimensionality reduction obtained with SVD underlies some techniques used in document analysis such as latent semantic analysis, where a term-document matrix is used as the basis to obtain linearly independent components. Content-based filtering requires us to specify the attributes or features that describe the items in our database.