On Overfitting and The Effective Number of Hidden Units

doi:10.4324/9781315806433-49

ABSTRACT

The problem of overfitting has drawn significant attention with sometimes apparently contradictory results, even about the mere presence or absence of the effect. In such experiments, the available data is split into two sets: a training set (used for the estimation of the parameters of the model; i.e., the weights and biases of the network), and a test set (used for the estimation of the generalization performance; i.e., the performance on out-of-sample data not used for fitting). An operational definition of overfitting is that the out-of-sample error starts to increase with training time after having gone through a minimum. A standard interpretation is that the network starts, at this point, to pick up idiosyncrasies of the training set that do not generalize to the test set.