ABSTRACT
Regression: A frequently applied statistical technique that serves as a basis for studying and characterizing a system of interest, by formulating a reasonable mathematical model of the relationship between a response variable
y and a
set of p explanatory variables x1, x2, … xp. The choice of an explicit form of the model may be based on previous knowledge of a system or on considerations such as ‘smoothness’ and continuity of y as a function of the explanatory variables (sometimes called the independent variables, although they are rarely independent; explanatory variables is the preferred term). Simple linear regression: A linear regression model with a single explanatory variable. The data consist of n pairs of values (y1, x1), (y2, x2), … (yn, xn). The model for the observed values of the response variable is
y x i ni i i= + + =β β ε0 1 1, … where β0 and β1 are, respectively, the intercept and slope parameters of the model and the εi are error terms assumed to have a N(0,
σ2) distribution. The
parameters β0 and β1 are estimated from the sample observations by least
squares, i.e., the minimization of S i i
=
S y xi i
i= − −
∑( ) 1
∂ = − − −
∂ ∂ = − −
y x
S y
β β β
β β
( )
( 0 1− β x xi i)
Setting ∂ ∂ =
∂ ∂ =
S S β β0 1
0 0, leads to the following estimators of the two model
parameters:
ˆ ˆ , ˆ ( )( )
( )
= − =
− −
−
∑ y x
y y x x
x x
The variance σ2 is estimated by s y y
n
2 =
−
−
∑( ) . The estimated
variance of the estimated slope parameter is Var(ˆ )
( )
=
−
x xi i
n The
estimated variance of a predicted value ypred at a given value of x, say, x0, is
Var pred( ) ( )
( )
y s n
x x
x xi i
n = + +
−
−
1 1
Multiple linear regression: A generalization to more than a single explanatory variable of the simple linear regression model. The multiple linear regression model is given by
y x xi i p ip i= + + + +β β β ε0 1 1
i x i i ip1 2, represent this individual’s values on p explanatory variables, with i = 1, 2, … n. As usual, n represents the sample size. The residual or error terms εi, i = 1, … n are assumed to be independent random variables having a normal distribution with mean zero and constant variance σ2. So the response variable y also has a normal distribution with expected value E y x x x x xp p p| , , ,1 2 0 1 1 ( ) = + + +β β β and variance σ2 .