Handling Missing Data: Overview and Introduction
DOI link for Handling Missing Data: Overview and Introduction
Handling Missing Data: Overview and Introduction book
Missing data occur in almost every survey dataset when participation is not mandatory. They occur either as missing items if not all information of otherwise observed units, like individuals or households, is observed, or as missing units when sampled objects are not observed at all. Depending on the mechanism that led to the missing data, the observed part of the dataset may not be a simple random subsample of the selected data values. Thus, standard inference methods that would have been applied to the complete sample, if applied to the observed subsample only, may fail to allow valid inferences. But even if the observed data can be interpreted as a simple random subsample from the complete sample, standard analysis tools may not be able to adequately handle incompletely observed units. Thus, methods to analyze incomplete datasets and to compensate for possible bias introduced by the missing mechanism are necessary. So-called ad hoc methods like ignoring the missing mechanism or imputing (conditional) means have been shown either not to work or to work only in very specific situations (e.g., Horton & Kleinman, 2007). On the other hand, methods that allow for valid inferences in the analysis of interest, like weighting (e.g., Wooldridge, 2002), mainly adopted to compensate for missing units, or multiple imputation (Rubin, 1987), usually used to compensate for missing items, are more demanding.