Extrema and the Method of Lagrange Multipliers | 10

ABSTRACT

Suppose that one has a multi-variable function-which can be denoted as a function of a vector-and one would like to define the expression “the derivative of the function in a direction.” The simplest way to define the derivative is to build a function of a single variable from the multi-variable function and then define the derivative of the function in the direction as the derivative of the new function calculated in the ordinary way. Suppose that the multivariable function is f(~x). If we want a function that picks out a particular direction starting from a particular point, then we let ~x = ~x0 + nˆt where nˆ is a unit vector∗ in the direction of interest. (We use a unit vector so that when t = k we will have proceeded k units in the direction of interest. This also makes our definition of the directional derivative coincide with the ordinary derivative in the one dimensional case.) Clearly the directional derivative (at

dnˆ f(~x) ≡ d

dt f(~x0 + nˆt)

∣∣∣∣ t=0

= N∑ i=1

∂

∂xi f(~x0)nˆi = ∇~xf(~x0) · nˆ (5.1)

where · is the inner product of two vectors (see §A.15), and ∇~xf(~x0) is called the gradient of f(~x). The gradient is defined as:

∇~xf(~x0) ≡

 ∂ ∂x1

f(~x) ∣∣∣ ~x=~x0

... ∂

∣∣∣ ~x=~x0

 . (5.2) If one is searching for the extrema (the minima and maxima) of f(~x) located in some open set, it is certainly necessary that:

dnˆ f(~x) = ∇~xf(~x) · nˆ = 0

for all nˆ. This says that at extremal points the gradient of the function must be orthogonal to vectors of any given direction. The only vector that is orthogonal to all directions is the zero vector. We find that at extremal points, x0, (of a differentiable function, in an open set) we have:

∇~xf(~x)|~x=~x0 = ~0. (5.3)

Suppose that one would like to “solve” the linear system:

A~x = ~b

where ~x has n elements, A is an m × n array, and m > n. (See Appendix A for a review of linear algebra.) In general there are no solutions to this equation, so we must weaken our notion of solution. A reasonable definition of a solution-called the least-squares solution-is the vector ~x that minimizes:

e(~x) ≡ ‖A~x−~b‖2. To find this minimum, we look for the ~x for which the gradient of the function e(~x) is ~0. Let us consider:

∂

∂xi ‖A~x−~b‖2.