ABSTRACT

We now proceed to define the terminology and notation that will be used throughout this work. A particular population P of N individuals (sometimes also called elements or units) {u1, u2, . . . un} are identified by their labels i = 1, 2, . . . N . P may consist of all the students at ETH, of all the trees of Switzerland (where, in this case, N is unknown), of all the employees older than 18 years on August 1st 2007 in Switzerland. In the set theoretical sense it must be clear whether something belongs to the population P or not. Surprisingly, this seemingly simple requirement can be the source of great problems in applications (what is a tree, an unemployed person, etc. ?). Defining the population under study is a key task at the planning stage, often requiring intensive discussions and frustrating compromises, a matter we shall not discuss any further in this book. For each individual i in P one is interested in p response variables with numerical values Y (m)i ,m = 1, . . . p, i = 1 . . . N , which can be measured at a given time point in an error-free manner. Note that any qualitative variable can always be coded numerically with a set of 0/1 indicator variables. Whenever ambiguity is excluded we shall drop the upper index that identifies the response variable. An error-free assumption can be problematic even when dealing with physical quantities (e.g. the volume of a tree) and can also be a source of great difficulties in the case of non-response during interviews. Usually the quantities of primary interest are population totals, means and variances. That is

Y (m) = N∑ i=1

Y (m) i (1.1)

Y¯ (m) = Y (m)

N (1.2)

S2Y (m) = ∑N

i=1(Y (m) i − Y¯ (m))2 N − 1 (1.3)

Sometimes, more complicated statistical characteristics of the population are needed, such as ratios, covariances, or correlations

Rl,m = Y (m)

Y (l) (1.4)

Cl,m = ∑N

i=1(Y (l) i − Y¯ (l))(Y (m)i − Y¯ (m))

N − 1 (1.5)

ρl,m = Cl,m√

(1.6)

In any case, the estimation of totals will be a key issue. In pursuing a forest inventory the spatial mean of additive quantities is frequently more important than the population total. Suppose that a forested area F with a surface area λ(F ) in ha contains a well-defined population of N trees. Moreover, say that all trees have at least a 12cm diameter at 1.3m above the ground (diameter at breast height, orDBH) and that the response variables of interest are Y (1)i ≡ 1 and Y (2)i =volume in m

3. Then the spatial mean Y¯ (m)s = Y (m)

λ(F ) represents the number of stems per ha (m = 1) and the volume per ha (m = 2) respectively. Note that N is usually unknown and will have to be estimated via the variable Y

(1) i . Likewise, the mean volume per tree can be obtained by estimating the

ratio R2,1.