ABSTRACT

There are several fundamentally different situations in which it may be desired to select a subset from a larger number of variables. The situation with which this monograph is concerned is that of predicting the value of one variable, which will be denoted by Y , from a number of other variables, which will usually be denoted by X ’s. It may be necessaryto do this because it is expensive to measure the variable Y and it is hoped to be able to predict it with sufficient accuracy from other variables which can be measured cheaply. A more common situation is that in which the X-variables measured at one time can be used to predict Y at some future time. In either case, unless the true form of the relationship between the X-and Y -variables is known, it will be necessary for the data used to select the variables and to calibrate the relationship to be representative of the conditions in which the relationship will be used for prediction. This last remark particularly applies when the prediction requires extrapolation, e.g. in time, beyond the range over which a relationship between the variables is believed to be an adequate approximation. Some examples of applications are:

1. The estimation of wool quality, which can be measured accurately using chemical techniques requiring considerable time and expense, from reflectances in the near-infrared region, which can be obtained quickly and relatively inexpensively.