ABSTRACT

Support vector machines (SVMs) offer a direct approach to binary classification: try to find a hyperplane in some feature space that “best” separates the two classes. The SVM will classify all the points on one side of the decision boundary as belonging to one class and all those on the other side as belonging to the other class. SVMs use the kernel trick to enlarge the feature space using basis functions. SVMs can also be extended to regression problems. In essence, SVMs find a separating hyperplane in an enlarged feature space that generally results in a nonlinear decision boundary in the original feature space with good generalization performance. SVMs typically have to estimate at least as many parameters as there are rows in the training data. Hence, SVMs are more commonly used in wide data situations. In order to obtain predicted class probabilities from an SVM, additional parameters need to be estimated as described in J. C. Platt.