ABSTRACT

Principal component analysis (PCA) is an essential tool in applications ranging from image and video processing, web data analysis to bioinformatics. PCA seeks the best low-dimensional approximation to high-dimensional data. In particular, let the data matrix A ∈ Rm×n is of the form A := Lo + No, where Lo is a low-rank matrix, i.e., rank(Lo) min{m,n}, and No is a dense small-magnitude noise matrix. PCA generates a rank-r¯ approximation to Lo by solving minL{‖L − A‖ : rank(L) ≤ r¯}, which requires computing one singular value decomposition (SVD) of A, where ‖X‖ denotes the spectral norm of X ∈ Rm×n, i.e., maximum singular value of X. However, when the observed data is corrupted by gross errors, classical PCA becomes impractical because even a single grossly corrupted observation can destroy the underlying low-rank structure in such a way that it cannot be recovered using PCA. To remedy this shortcoming, an extended model called robust PCA (RPCA) was considered by Wright et al. [62], Cande`s et al. [12] and Chandrasekaran et al. [13] when the gross errors are sparse. In this model it is assumed that the data matrix A ∈ Rm×n is of the form A := Lo + So, where Lo is a low-rank matrix as in PCA, and So is a sparse gross error matrix, i.e., the number of nonzero elements is small. This model attempts to decompose the observed matrix A as a sum of two matrices: a low-rank approximation to A, and a sparse matrix that captures the gross errors. Under certain incoherence assumptions on Lo, and randomness assumption on the sparsity pattern of So, Cande`s et al. [12] showed that one can recover (Lo, So) exactly with very high probability as the unique solution of a convex optimization problem, called Principal Component Pursuit (PCP).