ABSTRACT

CONTENTS 23.1 Introduction 505

23.2 Features of Classification Model Learning Algorithms 506

23.3 Decision Trees and Decision Stumps 510

23.4 Naïve Bayes 513

23.5 Neural Networks 514

23.6 Support Vector Machines 517

23.7 Further Reading 521

References 521

23.1 INTRODUCTION A supervised learning task involves constructing a mapping from input data (normally described by several features) to the appropriate outputs. A set of training examplesexamples with known output values-is used by a learning algorithm to generate a model. This model is intended to approximate the mapping between the inputs and outputs. This model can be used to generate predicted outputs for inputs that have not been seen before. Within supervised learning, one type of task is a classification learning task, in which each output is one or more classes to which the input belongs. For example, we may have data consisting of observations of sunspots. In a classification learning task, our goal may be to learn to classify sunspots into one of several types. Each example may correspond to one candidate sunspot with various measurements or just an image. A learning algorithm would use the supplied examples to generate a model that approximates the mapping between each supplied set of measurements and the type of sunspot. This model can then be used to classify previously unseen sunspots based on the candidate’s measurements. The generalization performance of a learned model (how closely the target outputs and the model’s predicted outputs agree for patterns that have not been presented to the learning algorithm) would provide an indication of how well the model has learned the desired mapping.