chapter  20
20 Pages

Multi-Layer Support Vector Machines

WithMarco A. Wiering and Lambert R.B. Schomaker

Marco A. Wiering Institute of Artificial Intelligence and Cognitive Engineering, University of Groningen

Lambert R.B. Schomaker Institute of Artificial Intelligence and Cognitive Engineering, University of Groningen

20.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457 20.2 Multi-Layer Support Vector Machines for Regression Problems 459 20.3 Multi-Layer Support Vector Machines for Classification

Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463 20.4 Multi-Layer Support Vector Machines for Dimensionality

Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464 20.5 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465

20.5.1 Experiments on Regression Problems . . . . . . . . . . . . . . . . . . . 465 20.5.2 Experiments on Classification Problems . . . . . . . . . . . . . . . . 467 20.5.3 Experiments on Dimensionality Reduction Problems . . . 468 20.5.4 Experimental Analysis of the Multi-Layer SVM . . . . . . . . 469

20.6 Discussion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472

20.1 Introduction Support vector machines (SVMs) [24, 8, 20, 22] and other learning algo-

rithms based on kernels have been shown to obtain very good results on many different classification and regression datasets. SVMs have the advantage of generalizing very well, but the standard SVM is limited in several ways. First, the SVM uses a single layer of support vector coefficients and is therefore a shallow model. Deep architectures [17, 14, 13, 4, 25, 6] have been shown to be very promising alternatives to these shallow models. Second, the results of the SVM rely heavily on the selected kernel function, but most kernel functions

Vector

have limited flexibility in the sense they they are not trainable on a dataset. Therefore, it is a natural step to go from the standard single-layer SVM to the multi-layer SVM (ML-SVM). Just like the invention of the backpropagation algorithm [26, 19] allowed to construct multi-layer perceptrons from perceptrons, this chapter describes techniques for constructing and training multi-layer SVMs consisting only of SVMs.