ABSTRACT

When the input data lie in a high-dimensional space, dimensionality reduction techniques, such as Principal Component Analysis (PCA), Canonical Correlation Analysis (CCA), Partial Least Squares (PLS), and Linear Discriminant Analysis (LDA), are commonly applied as a separate data preprocessing step before classification algorithms. One limitation of these approaches is the weak connection between dimensionality reduction and classification algorithms. Indeed, dimensionality reduction algorithms such as CCA and PLS and classification algorithms such as SVM optimize different criteria. It is unclear which dimensionality reduction algorithm can best improve a specific classification algorithm such as SVM. In addition, most traditional dimensionality reduction algorithms assume that a common set of samples is involved for all classes. However, in many applications, e.g., when the data is unbalanced, it is desirable to relax this restriction so that the input data associated with each class can be better balanced. This is especially useful when some of the class labels in the data are missing.