ABSTRACT

The central theme of this chapter is modeling associations among variables. Understanding these associations can be important for many reasons, including

Reason 1. Prediction of future observations

Reason 2. Variable screening

Reason 3. System explanation

Reason 4. Parameter estimation

The primary tool used to model associations among variables in this chapter is regression. Regression analysis is used for modeling the relationship between a single variable Y , called the response or dependent variable, and one or more explanatory variables, also called predictor(s) or independent variable(s), x1, x2, . . . , xp1. The response variable must be a continuous variable, but the predictor variables can be either continuous, discrete, or categorical. The word “regression” is due to Sir Francis Galton, who demonstrated that o↵spring do not tend toward the size of the parents; rather, o↵spring size tends toward the mean of the population. That is, there is a “regression toward mediocrity.” The following examples illustrate scenarios where it is important to understand the associations among response and predictor variables.