ABSTRACT

The variable selection from exhaustively searching all combinations of covariates is called best subset selection in the regression setting. Cross-Validation is a data-splitting procedure, which repeatedly splits the data into a training and a test set. The goal is to identify the optimal model. An independent data set is desirable to estimate the chosen model. Many validation sets are generated in this way and the complementary part of the data is used each time as the training set. The vertical dashed line is where the minimum occurs, and the horizontal line shows the one SD rule. Regularizes involves shrinking the regression model coefficients towards 0 which stabilizes variance in exchange for a small increase in bias. All penalized regression methods require selection of the regularization parameter, which determines the strength of the imposed penalty.