The first phase draws a large sample s1 of n1 points xi ∈ s1 that are independently and uniformly distributed within the forest area F . At each of those points auxiliary information is collected, very often of a purely qualitative nature (e.g. following the interpretation of aerial photographs). The second phase draws a small sample s2 ⊂ s1 of n2 points from s1 according to equal probability sampling without replacement. At each point x ∈ s2 the terrestrial inventory provides the local density Y (x). For points x ∈ s1 \ s2, i.e. in the large but not in the small sample, only the auxiliary information is available. Nevertheless, this allows one to make a prediction Ŷ (x) of the true local density Y (x) in the forest. Strictly speaking, we shall assume that this prediction is given by an external model and is not adjusted with the data from the actual inventory. Let us examine probably the most important example. Here, stand structure is determined through the interpretation of aerial photographs in s1, which, via pre-existing yield tables allows the inventorist to make a reasonable prediction of timber volume per ha or the number of stems per ha. However, if an external model is not available, an internal model first has to be fitted. Usually this is done by coding the auxiliary information at point x into a vector Z(x) ∈ m. A prediction is then obtained with a linear model, i.e. Ŷ (x) = Z(x)tβ (the upper index t indicating the transposition of the vector). Estimating the unknown parameter vector β can be done in several ways, in particular by completely ignoring sampling theory and using standard statistical tools for a linear model, i.e. by performing a linear regression of Y (x) on Z(x). Alternatively, one can estimate β within the framework of sampling theory. There is some evidence that the choice of the estimation procedures is of secondary importance and that internal models can be treated as external models if n2 is sufficiently large (Mandallaz, 1991). However, predictions based on models, particularly external ones, should not be blindly trusted. Moreover, it is intuitively clear that deviations should be considered between model and reality. By analogy with the model-assisted procedures discussed in Section 3.3, we examine the residual R(x) = Y (x) − Ŷ (x) and define the two-phase one-stage estimator

Ŷreg = 1 n1

Ŷ (x) + 1 n2

R(x) (5.1)

The lower index reg indicates that the two-phase estimator is indeed a modelassisted regression estimator. It is simply the mean of the prediction plus the mean of the residuals, which is intuitively very appealing. Given s1 the properties of 5.1 are governed by standard sampling theory for a finite population (see Eq. 2.16) and the conditioning rules given in Appendix B. This leads immediately to

E2|1 1 n2

R(x) = 1 n1


Because Y (x) = Ŷ (x) +R(x) we have by Eq. B.3 and 4.6

E1,2(Ŷreg) = E1 1 n1

Y (x) = Y¯

and therefore unbiasedness. Furthermore

V2|1 1 n2

R(x) = ( 1− n2


) 1 n2

1 n1 − 1

(R(x)− R¯1)2

where we set R¯i = 1ni ∑

x∈si R(x) for i = 1, 2. Consequently we have

E1V2|1(Ŷreg) = ( 1− n2


) 1 n2

1 λ(F )

∫ F

(R(x)− R¯)2dx


R¯ = 1

λ(F )

∫ F


Let us emphasize once again that we do not assume that the mean residual is zero. This is never the case with external models. With an internal model and least squares estimation of β , R¯2 is usually 0 by construction, or nearly 0. We can then state the main results for two-phase one-stage simple random sampling according to the following theorem

Theorem 5.1.1.