Smoothing Parameter Selection and Inference | 9

ABSTRACT

The penalized least squares (2.11) represents a compromise between the goodness-of-fit and a penalty to the departure from the null space H0. The balance is controlled by the smoothing parameter λ. As λ varies from 0 to ∞, we have a family of estimates with fˆ ∈ H0 when λ =∞. To illustrate the impact of the smoothing parameter, consider the

Stratford weather data consisting of daily maximum temperatures in Stratford, Texas, during 1990. Observations are shown in Figure 3.1. Consider the regression model (1.1) where n = 73 and f represents expected maximum temperature as a function of time in a year. Denote x as the time variable scaled into [0, 1]. It is reasonable to assume that f is a smooth periodic function. In particular, we assume that f ∈W 22 (per). For a fixed λ, say 0.001, one can fit the cubic periodic spline as follows:

> data(Stratford); attach(Stratford)

> ssr(y~1, rk=periodic(x), limnla=log10(73*.001))

where the argument limnla specifies a search range for log10(nλ). To see how a spline fit is affected by the choice of λ, periodic spline fits with six different values of λ are shown in Figure 3.1. It is obvious that the fit with λ = ∞ is a constant, that is, f∞ ∈ H0. The fit with λ = 0 interpolates data. A larger λ leads to a smoother fit. Both λ = 0.0001 and λ = 0.00001 lead to visually reasonable fits. In practice it is desirable to select the smoothing parameter using an

objective method rather than visual inspection. In a sense, a data-driven choice of λ allows data to speak for themselves. Thus, it is not exaggerating to say that the choice of λ is the spirit and soul of nonparametric regression. We now inspect how λ controls the fit. Again, consider model (1.1)

for Stratford weather data. Let us first consider a parametric approach up to a

FIGURE 3.1 Stratford weather data, plot of observations, and the periodic spline fits with different smoothing parameters.