ABSTRACT

The standard design-based methods for analyzing sample survey data use no models at analysis stage. For surveys that are self-weighting, mean estimates (and mean estimates within subpopulations, including those based on areas) are simply the average of the relevant sample observations. For more complex surveys, scaling or weighting of observations is required to better reflect the structure of the population (or subpopulation) being sampled; the unique weight to ensure that estimates are unbiased is the inverse of its selection probability for each observation. These ideas can be extended to estimation of totals, and of other parameters of interest. The classic texts on this topic include Hansen et al. (1953) and Cochran (1977). References that also include more recent developments in sampling theory are Lohr (2010) and Fuller (2009); sample surveys for developing and transition countries are outlined and discussed in United Nations (2005). The underlying idea of probability-based or design-based sampling was developed in the seminal paper of Neyman (1934). These methods are characterized by them using, for estimates in a domain or subpopulation of interest, only the sample data collected within that domain. Such estimates are called direct estimates. There have also been extensions from design-based to model-based estimation, even for direct estimates. See Valliant et al. (2000) for example.