ABSTRACT

Let us conclude by posing a number of questions and examining how far they can be answered.

Question 1. How can we test whether there is any relationship between the predictors and the predictand? This is a frequent question in the social and biological sciences. Data have

been collected, on say 20 or 50 predictors, and this may have been augmented with constructed variables such as reciprocals, logarithms, squares, and interactions of the original variables. An automatic computer package may have selected 5 or 10 of these predictors, and has probably output an R2 value for the selected subset. Could we have done as well if the predictors had been replaced with random numbers or columns from the telephone directory? If the package used the Efroymson stepwise algorithm, sometimes simply

called stepwise regression, then Table 4.4 or formula (4.1) can be used to test whether the value of R2 could reasonably have arisen by chance if there is no real relationship between the Y -variable and any of the X-variables. Clearly, the more exhaustive the search procedure used, the higher the R2 value that can be achieved. References are given in section 4.1 to tables for other search algorithms, though there is scope for the extension of these tables. Some of these tables allow for nonorthogonality of the predictors, others do not. In fact, the degree of correlation among the predictors does not make much difference to the distribution of R2. Alternatively, if the number of observations exceeds the number of available

predictors, the Spjφtvoll test described in section 4.2 can be used to test whether the selected subset fits significantly better than just a constant.