ABSTRACT

Our basic framework is as before, that is, X ∈ X , X ∼ P ∈ P , usually parametrized as P = {Pθ : θ ∈ Θ}. See Section 1.1.2. In this parametric case, how do we select reasonable estimates for θ itself? That is, how do we find a function θ̂(X) of the vector observationX that in some sense “is close” to the unknown θ? The fundamental heuristic is typically the following. Consider a function

ρ : X ×Θ→ R and define

D(θ0, θ) ≡ Eθ0ρ(X, θ). Suppose that as a function of θ, D(θ0, θ) measures the (population) discrepancy between θ and the true value θ0 of the parameter in the sense that D(θ0, θ) is uniquely minimized for θ = θ0. That is, if Pθ0 were true and we knew D(θ0, θ) as a function of θ, we could obtain θ0 as the minimizer. Of course, we don’t know the truth so this is inoperable, but

ρ(X, θ) is the optimal MSPE predictor of D(θ0, θ) (Lemma 1.4.1). So it is natural to

consider θ̂(X) minimizing ρ(X, θ) as an estimate of θ0. Under these assumptions we call

ρ(·, ·) a contrast function and θ̂(X) a minimum contrast estimate. Now suppose Θ is Euclidean ⊂ Rd, the true θ0 is an interior point of Θ, and θ →

D(θ0, θ) is smooth. Then we expect

∇θD(θ0, θ) ∣∣ θ=θ0

= 0 (2.1.1)

where∇ denotes the gradient,

∇θ = (

∂θ1 , . . . ,

∂θd

)T .