ABSTRACT

Consider in a very general sense the problem of building a classifier given a set of training data. Given no prior information about the underlying distributions describing the domain features and the class variable, it is increasingly common to design the classifier using a so-called non-parametric model, such as a decision tree or a neural network. For each type of model a variety of well-understood algorithms exist to select the best model and its parameters, given the training data. Traditionally these algorithms rely on a goodness-of-fit measure as evaluated on the training data to select the best model—for example, the classification error rate or the mean-squared error. However, empirical experience has universally shown that relying on goodness of fit alone to select a model can be misleading—one can argue from a variety of viewpoints that a quantitative version of Occam's razor is required to select a model which appropriately trades off goodness of fit with model complexity. Simpler models stand a better chance of generalizing well to new data than more complicated models do. Hence, for example, in decision tree design, 'pruning' techniques are widely employed to prune back the irrelevant branches of an initially large tree.