When to stop? | 7 | v2 | Subset Selection in Regression

ABSTRACT

There are many criteria being used in deciding how many variables to include. These can be broadly divided into three classes, namely, • Prediction criteria • Likelihood or information criteria • Maximizing Bayesian posterior probabilities. The emphasis in this book is upon prediction rather than upon ﬁnding the ‘right’ model, and hence there is far more material here on prediction than on likelihood criteria. Bayesian methods will be treated later in Chapter 7. Under prediction criteria, we ﬁrst look at the mean squared error of predic-

tion (MSEP ) for the case in which the same values of the X-variables will be used for prediction as those in the data used for model construction. This is sometimes known as the ﬁxed model or X-ﬁxed case. The principal criterion used here is Mallows’ Cp and variants of it. We look in detail at the case in which the X-predictors are orthogonal. Though this case is not common in practice, with most data being from observational studies rather than from designed experiments, it is useful to examine this case for the insights that it gives into the biases that occur in more general cases. The random model or X-random case assumes that future cases for which

predictions will be required will come from the same multivariate normal population as that of the data used for model construction. The section on prediction criteria concludes with a review of some of the

applications of cross-validation to model building. The basic idea here is that part of the data is used for model selection, and the remainder is used for validation. We start with the PRESS statistic, which uses leave-one-out crossvalidation, and uses the model chosen using the full data set. We then look at leaving out, say nv cases at the time, selecting a model or a number of models that ﬁt the remaining (n− nv) cases well, then ﬁnding how well the model or models ﬁt the nv validation cases. This is repeated many times for diﬀerent random samples of nv cases out of the n. Under likelihood and other criteria, including the minimum description

length (MDL), we look at the Akaike Information Criterion (AIC) and variations of it such as the Bayesian Information Criterion (BIC) and others. In the appendix at the end of this chapter, some of the criteria are roughly

equated to F -to-enter criteria.