## The Linear Least Squares Problem

In the previous chapter, we have seen several examples of the linear model in the form y = Xb+ e, relating N response observations in the vector y to the explanatory variables stored as columns in the design matrix X. One mathematical view of the linear model is the closest or best approximation Xb to the observed vector y. If we deﬁne closeness or distance in the familiar Euclidean manner, then ﬁnding the best approximation means minimizing the squared distance between the observed vector y and its approximation Xb, given by the sum of squares function

Q(b) = (y− Xb)T (y− Xb) = ‖y− Xb‖2 . (2.1)

A value of the vector b that minimizes Q(b) is called a least squares solution. We will refrain from calling it an estimator of b, until we discuss seriously the idea of estimation. In order to ﬁnd the least squares solution, we take the partial derivatives with respect to the components of b to obtain the gradient vector

∂Q

∂b =

⎡ ⎢⎣

· · · ∂Q ∂bp

⎤ ⎥⎦, (2.2)

a vector whose j th component is the partial derivative of the function Q with respect to the j th variable b j , also written as (

∂Q(b) ∂b ) j = ∂Q(b)/∂b j . The minimum of the

function Q(b) will occur when the gradient is zero, so we solve ∂Q ∂b = 0 to ﬁnd least

squares solutions. We now include some results on vector derivatives that will be helpful.