Subset Selection of Predictor Variables in Multiple Linear Regression

ABSTRACT

This chapter discusses: model selection procedures for choosing a subset from a full set of predictors for a multiple linear regression model; why researchers may want to select a subset of predictors from a full set; why relying solely on automated selection procedures is contentious if the primary research objective is to understand the underlying relationship between an outcome variable and its predictors; how a model selection procedure may work well if the primary research objective is prediction; under-fitting, over-fitting, and the biasvariance trade-off; forward, backward, stepwise (Efromyson's algorithm), and all subsets selection methods and collinearity's impact on these methods; fit criteria involved in predictor selection; data-driven inference issues that occur when the same data set is used for selection and estimation; and model selection methods in the REG and GLMSELECT procedures with illustrative examples and SAS programs.