ABSTRACT

Principal component analysis (PCA) is the workhorse of high-dimensional data analysis and dimensionality reduction, with numerous applications in statistics, engineering, and the biobehavioral sciences; see, e.g., [26]. Nowadays ubiquitous e-commerce sites, the Web, and urban traffic surveillance systems generate massive volumes of data. As a result, the problem of extracting the most informative, yet low-dimensional structure from high-dimensional datasets is of paramount importance [22, 45]. To this end, PCA provides least-squares (LS) optimal linear approximants in Rq to a data set in ambient space Rp, for q ≤ p. The desired linear subspace is obtained from the q-dominant eigenvectors of the sample data covariance matrix, or equivalently from the q-dominant singular vectors of the data matrix [26].