ABSTRACT

Generalized linear models (GLM) (McCullagh and Nelder 1989) provide a unified framework for analysis of data from exponential families. Denote (xi, yi) for i = 1, . . . , n as independent observations on independent variables x = (x1, . . . , xd) and dependent variable y. Assume that yi are generated from a distribution in the exponential family with the density function

g(yi; fi, φ) = exp

{ yih(fi)− b(fi)

ai(φ) + c(yi, φ)

} , (6.1)

where fi = f(xi), h(fi) is a monotone transformation of fi known as the canonical parameter, and φ is a dispersion parameter. The function f models the effect of independent variables x. Denote µi = E(yi), Gc as the canonical link such that Gc(µi) = h(fi), and G as the link function such that G(µi) = fi. Then h = Gc ◦G−1, and it reduces to the identity function when the canonical link is chosen for G. The last term c(yi, φ) in (6.1) is independent of f . A GLM assumes that f(x) = xTβ. Similar to the linear models for

Gaussian data, the parametric GLM may be too restrictive for some applications. We consider the nonparametric extension of the GLM in this chapter. In addition to providing more flexible models, the nonparametric extension also provides model building and diagnostic methods for GLMs. Let the domain of each covariate xk be an arbitrary set Xk. Consider f

as a multivariate function on the product domain X = X1×X2×· · ·×Xd. The SS ANOVA decomposition introduced in Chapter 4 may be applied to construct candidate model spaces for f . In particular, we will assume that f ∈ M, where

is an SS ANOVA model space defined in (4.30),H0 = span{φ1, . . . , φp} is a finite dimensional space collecting all functions that are not penalized, and Hj for j = 1, . . . , q are orthogonal RKHS’s with RKs Rj . The same notations in Chapter 4 will be used in this Chapter. We assume the same density function (6.1) for yi. However, for gen-

erality, we assume that f is observed through some linear functionals. Specifically, fi = Lif , where Li are known bounded linear functionals.