ABSTRACT

B.1 The Least Squares Principle as a Maximum Likelihood Principle We have introduced the likelihood as the probability to observe what we have observed. If we consider a continuous random variable Y , we can identify this probability with the value of the density of the distribution of Y evaluated at the observed value y. Now given the covariate values x1,x2, . . . , xp, the assumption of a normal distribution of Y with mean μ(x1,x2, . . . , xp) = β0 + β1x1 + β2x2 + . . . + βpxp and conditional variance σ2e implies that the density is of the following form:

such that the likelihood of the observation yi is

Li(β0,β1, . . . , βp,σ2e ) = 1√

and the log likelihood is

li(β0,β1, . . . , βp,σ2e ) = logLi(β0,β1, . . . , βp,σ2e )

= log [

1√ 2πσ2e

] + log

[ e−

]

= log [

1√ 2πσ2e

] − 12

(yi− (βˆ0 + βˆ1xi1 + βˆ2xi2 + . . . + βˆpxip))2 σ2e

.

Maximising the overall likelihood ∏ni=1 Li(β0,β1, . . . , βp,σ2e ) is equivalent to maximising its logarithm