ABSTRACT

In intuitive terms, the objective of statistical modeling is to separate a given data sequence into useful learnable information and the rest, which may be viewed just as noninformative noise. The difficulty is in the formalization of the two constituents: the ‘useful information’ and the ‘noise’. Traditionally, modeling is done by envoking a metaphysical ‘true’ data generating distribution, which is to be estimated from the data by minimization of an appropriate mean performance criterion, which itself is to be estimated from the data. Since the basic issue of how to formalize the useful information and the noise is not addressed such an approach cannot provide a rational explanation of why the best approximation of the ‘truth’ is not the most complex model fitted to the data. To avoid this disastrous conclusion one has to add an ad hoc term to the criterion to penalize the model complexity. But because the added term lacks any deeper meaning it does not reflect adequately the model complexity nor its effect to models’ performance, and such a metaphysical assumption does not provide a sound basis for a fruitful theory of modeling.