ABSTRACT
Regression model building, as we have seen, can not only be straightforward,
but also tricky. Many times, if the researcher knows what variables are
important and of interest, little effort is needed. However, when a researcher
is exploring new areas or consulting for others, this is often not the case. In
these situations, it can be valuable to collect wide data concerning variables
thought to influence the outcome of the dependent variable, y. The entire process may be viewed as
1. Identifying independent predictor xi variables of interest 2. Collecting measurements on those xi variables related to the observed
measurements of the yi values 3. Selecting significant xi variables by statistical procedures, in terms of
increasing SSR and decreasing SSE 4. With the selected variables, validating the conditions under which the
model is adequate
It is not uncommon for researchers to collect data on more variables than are
practical for use in regression analysis. For example, in a laundry detergent
validation study for which the author recently consulted, two methods were
used-one for top-loading machines and another for front-loading machines.
The main difference between the machines was water volume. Several micro-
organism species were used in the study, against three concentrations of an
antimicrobial laundry soap. Testing was conducted by two teams of techni-
cians at each of six different laboratories over a five-day period. The number
of variables to answer the research question, ‘‘Do significant differences in
the data exist among the test laboratories,’’ was extreme.