ABSTRACT

We present some statistical background for multiple imputation analysis. We briefly discuss two main strategies for drawing statistical inference: frequentist and Bayesian analysis. Multiple imputation sits on the intersection of these two strategies. Some fundamental concepts such as estimand, estimator/estimate, bias, variance, confidence interval, coverage, and hypothesis testing are presented. For Bayesian analysis, we discuss the idea of using Bayesian sampling algorithms to obtain draws from posterior distributions and make inference. Relevant methods include Markov Chain Monte Carlo (MCMC) and Gibbs sampling algorithms. From the frequentist perspective, bootstrap is a commonly-used resampling method for calculating the variance estimate. In some cases, bootstrap can also be used to approximate the posterior distribution of parameters of interest. We also introduce the likelihood-based approach to missing data problems under different assumptions of missingness mechanisms. Multiple imputation is tightly connected with the likelihood-based approach yet provides more flexibility for practical analysis. In addition, we recommend avoiding ad-hoc missing data approaches including complete-case analysis (case-wise deletion) and regression prediction. For missing data analysis, it is important to use simulation studies to assess and compare the performance among alternative methods. Simulation and real examples based on Research and Development Survey (RANDS) are used to illustrate these concepts and ideas.