ABSTRACT

Non-Linearly Separable Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 10.2.2 Example of Applying Kernel PCA to Separate

Telephone Network Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 10.2.3 Kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 10.2.4 Overview of Kernel PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 10.2.5 Strengths and Weaknesses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283

10.3 Linear Discriminant Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 10.3.1 Example of LDA Assisting Classification of Data . . . . . . . 284 10.3.2 Overview of LDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 10.3.3 Graph Representation of Scatter Matrices . . . . . . . . . . . . . . 287 10.3.4 Calculating the Optimal Objective Function as an

Eigenvalue Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294 10.3.5 Strengths and Weaknesses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295

10.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 10.4.1 Tumor Cell Diagnostics with MDS . . . . . . . . . . . . . . . . . . . . . . 295 10.4.2 Face Recognition with LDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298

10.5 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301 10.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306

both curse [8]—a blessing because it implies a lot of information, and a curse because the data might contain a lot of irrelevant information and be tougher to analyze. High-dimensional data is known to pose a wide variety of challenges in the field of statistics. Dimensionality Reduction, or DR, is therefore employed to deal with the issue of high-dimensional data. Specifically, DR is primarily applied when there is a need to:

1. Select only relevant features from the given data (feature selection) or

2. Extract lower-dimensional data from the current form of the data because the current form is too difficult to analyze (feature extraction).