ABSTRACT
Marco A. Wiering Institute of Artificial Intelligence and Cognitive Engineering, University of Groningen
Lambert R.B. Schomaker Institute of Artificial Intelligence and Cognitive Engineering, University of Groningen
20.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457 20.2 Multi-Layer Support Vector Machines for Regression Problems 459 20.3 Multi-Layer Support Vector Machines for Classification
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463 20.4 Multi-Layer Support Vector Machines for Dimensionality
Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464 20.5 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465
20.5.1 Experiments on Regression Problems . . . . . . . . . . . . . . . . . . . 465 20.5.2 Experiments on Classification Problems . . . . . . . . . . . . . . . . 467 20.5.3 Experiments on Dimensionality Reduction Problems . . . 468 20.5.4 Experimental Analysis of the Multi-Layer SVM . . . . . . . . 469
20.6 Discussion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472
20.1 Introduction Support vector machines (SVMs) [24, 8, 20, 22] and other learning algo-
rithms based on kernels have been shown to obtain very good results on many different classification and regression datasets. SVMs have the advantage of generalizing very well, but the standard SVM is limited in several ways. First, the SVM uses a single layer of support vector coefficients and is therefore a shallow model. Deep architectures [17, 14, 13, 4, 25, 6] have been shown to be very promising alternatives to these shallow models. Second, the results of the SVM rely heavily on the selected kernel function, but most kernel functions
Vector
have limited flexibility in the sense they they are not trainable on a dataset. Therefore, it is a natural step to go from the standard single-layer SVM to the multi-layer SVM (ML-SVM). Just like the invention of the backpropagation algorithm [26, 19] allowed to construct multi-layer perceptrons from perceptrons, this chapter describes techniques for constructing and training multi-layer SVMs consisting only of SVMs.