ABSTRACT

Numerical techniques for data analysis and feature extraction are discussed using the framework of matrix rank reduction. The singular value decomposition (SVD) and its properties are reviewed, and the relation to latent semantic indexing (LSI) and principal component analysis (PCA) is described. Methods that approximate the SVD are reviewed. A few basic methods for linear regression, in particular the partial least squares (PLS) method, are presented, and analyzed as rank reduction methods. Methods for feature extraction, based on centroids and the classical linear discriminant analysis (LDA), as well as an improved LDA based on the generalized singular value decomposition (LDA/GSVD) are described. The effectiveness of these methods are illustrated using examples from information retrieval and two-dimensional representation of clustered data.