ABSTRACT

Table 3.1 shows the heights (in centimeters) and the resting pulse rates (beats per minute) for 5 of a sample of 50 hospital patients (data sets with two variables are often referred to as bivariate data). Is it possible to use these data to construct a model for predicting pulse rate from height, and what type of model might be used? Such questions serve to introduce one of the most widely used statistical techniques: regression analysis. In very general terms, regression analysis involves the development and use of statistical techniques designed to reflect the way in which variation in an observed random variable changes with changing circumstances. More specifically, the aim of a regression analysis is to derive an equation relating a dependent and an explanatory variable or, more commonly, several explanatory variables. The derived equation may sometimes be used solely for prediction, but more often its primary purpose is as a way of establishing the relative importance of the explanatory variables in determining the response variable, that is, in establishing a useful model to describe the data. (Incidentally, the term regression was first introduced by Galton in the 19th century to characterize a tendency toward mediocrity, that is, more average, observed in the offspring of parents.)

In this chapter, we shall concern ourselves with regression models for a response variable that is continuous and for which there is a single explanatory variable. In Chapter 4, we will extend the model to deal with the situation in which there are several explanatory variables, and then, in Chapter 6, we shall consider suitable models for categorical response variables.