ABSTRACT
B.1 The Least Squares Principle as a Maximum Likelihood Principle We have introduced the likelihood as the probability to observe what we have observed. If we consider a continuous random variable Y , we can identify this probability with the value of the density of the distribution of Y evaluated at the observed value y. Now given the covariate values x1,x2, . . . , xp, the assumption of a normal distribution of Y with mean μ(x1,x2, . . . , xp) = β0 + β1x1 + β2x2 + . . . + βpxp and conditional variance σ2e implies that the density is of the following form:
such that the likelihood of the observation yi is
Li(β0,β1, . . . , βp,σ2e ) = 1√
and the log likelihood is
li(β0,β1, . . . , βp,σ2e ) = logLi(β0,β1, . . . , βp,σ2e )
= log [
1√ 2πσ2e
] + log
[ e−
]
= log [
1√ 2πσ2e
] − 12
(yi− (βˆ0 + βˆ1xi1 + βˆ2xi2 + . . . + βˆpxip))2 σ2e
.
Maximising the overall likelihood ∏ni=1 Li(β0,β1, . . . , βp,σ2e ) is equivalent to maximising its logarithm