chapter  4
Correlation and Regression
Pages 40

In many medical, biological, engineering, economic and other scientific applications, one wishes to establish a linear relationship between two or more variables. If there are only two variables, X and Y , then there are two ways a linear relationship can be characterized: (1) using the correlation coefficient; and (2) using linear regression. One would typically use a correlation coefficient to quantify the strength and direction of the linear association. If neither variable is used to predict the other, then both X and Y are assumed to be random variables, which makes inference more complicated. Linear regression is useful for answering the question: Given a value of X , what is the predicted value of Y ? For answering this type of question, values of X are assumed to be fixed (chosen) while values of Y are assumed to be random. In this chapter, we first review the Pearson correlation coefficient and then tackle simple, multiple, and polynomial regression models. Our primary approach for the presentation of the regression models is to use the general linear model involving matrices. We provide a short appendix at the end of the chapter to review matrix algebra. Our strategy for the presentation of regression in this chapter allows us to use the same approach for the different types of regression models. Also included in this chapter are strategies for visualizing regression data and building and assessing regression data. In the final section, we introduce two smoothing techniques, namely, the loess smoother and smoothing polynomial splines. To facilitate the discussion of the techniques covered in this chapter, we provide numerical examples with hand calculations to demonstrate how to fit simple models and also provide programs in SAS and R to demonstrate the implementation of calculations in more complex models.