ABSTRACT
Yi = x′iβ + Ui, i = 1, . . . , n (4.1)
with observations Y1, . . . , Yn, unknown and unobservable parameter β ∈ Rp, where xi ∈ Rp, i = 1, . . . , n are either given deterministic vectors or observable random vectors (regressors) and U1, . . . , Un are independent errors with a joint distribution function F. Often we consider the model in which the first component β1 of β is an intercept: it means that xi1 = 1, i = 1, . . . , n. Distribution function F is generally unknown; we only assume that it belongs to some family F of distribution functions. Denoting
Y = (Y1, . . . , Yn)′
X = Xn =
⎛⎜⎝ x ′ 1
... x′n
⎞⎟⎠ U = (U1, . . . , Un)′
we can rewrite (4.1) in the matrix form
Y = Xβ + U (4.2)
The most popular estimator of β is the classical least squares estimator (LSE) β̂. If X is of rank p, then β̂ is equal to
β̂ = (X ′X)−1X ′Y (4.3)
As it follows from the Gauss-Markov theorem, β̂ is the best linear unbiased estimator of β, provided the errors U1, . . . , Un have a finite second moment. Moreover, β̂ is the maximum likelihood estimator of β if U1, . . . , Un are normally distributed.