Overfitting: Old Problem, New Solution | 38 | v3

ABSTRACT

Overfitting, a problem akin to model inaccuracy, is as old as model building itself, as it is part of the modeling process. This chapter introduces a new solution, based on the data mining feature of the GenIQ Model, to the old problem of overfitting. It illustrates how the GenIQ Model identifies the complexity of the idiosyncrasies and subsequently instructs for deletion of the individuals that contribute to the complexity of the data under consideration. An overfitted model is one that approaches reproducing the training data on which the model is built—by capitalizing on the idiosyncrasies of the training data. Overfitted models have large predictive error variance: The confidence interval about the prediction error is large. The accuracy of the overfitted model on validation data will be outside the neighborhood of the model's accuracy based on the training data. The GenIQ Model consists of two components: a tree display and computer code.