ABSTRACT

CONTENTS 18.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310 18.2 Kernel Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 18.3 Kernel PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 18.4 EM for Kernel PCA and On-line PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 18.5 Choosing the Number of Components with Information Complexity . . . . . . . 319 18.6 Computational Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320 18.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321

In this paper, we deal with modelling or extracting information from an unlabelled data sample. In many real world applications appropriate preprocessing transformations of high dimensional input data can increase overall performance of algorithms. Feature extraction tries to find a compact description of the interesting features of the data. This can be useful for visualization of higher dimensional data in two or three dimensions or for data compression. It can also be applied as a preprocessing step that enables reducing the dimension of the data to be handled by a subsequent model. In this paper, we mainly concentrate on kernel PCA for feature selection in a higher dimensional feature space. We first introduce the usefulness of EM algorithm for standard PCA. We then present the kernel PCA. Kernel PCA is a nonlinear extension of PCA based on the kernel transformation ( Scholkopf, Smola, and Muller 1997). It requires the eigenvalue decomposition of a so-called kernel matrix of size N×N. In this contribution we propose an expectation maximization approach for performing kernel principal component analysis. Moreover we will introduce an online algorithm of EM for PCA. We show this to be a computationally efficient method especially when the number of data points is large. The information criteria of Bozdogan together with others are used to decide the number of eigenvalues.