Regression | 12 | R for Statistics | Pierre-Andre Cornillon

ABSTRACT

Simple linear regression is a statistical method used to model the linear relationship between two quantitative variables for explanatory or prediction purposes. Here we have one explanatory variable (denoted X) and one response variable (denoted Y ), connected by the following model:

Y = β0 + β1X + ε

where ε is the variable for noise or measurement error rate. The parameters β0 and β1 are unknown. The aim is to estimate them from a sample of n pairs (x1, y1), . . . , (xn, yn). The model is written in indexed form:

yi = β0 + β1xi + εi

The β0 coefficient corresponds to the intercept and β1 to the slope. We estimate these parameters by minimising the least-square criterion

(βˆ0, βˆ1) = argmin β0,β1

(yi − β0 − β1xi)2

Once the parameters have been estimated (βˆ0 and βˆ1), we obtain the regression line:

f(x) = βˆ0 + βˆ1x

from which predictions can be made. The adjusted or smoothed values are defined by

yˆi = βˆ0 + βˆ1xi

and the residuals by

εˆi = yi − yˆi Analysing the residuals is essential as it is used to check the individual fitting (outlier) and the global fitting of the model, for example by checking that there is no structure.