ABSTRACT

Parametric imputation models might suffer from model misspecifications. We present several strategies that adjust the imputation model to better fit with various complex features of real data. When continuous data deviate from normality, one common option is to apply transformation. We discuss when transformation is necessary and how it should be done in imputation. We also discuss using smoothing regression methods to model possible nonlinear relations between the missing variable and other covariates. When data come with bounds, one simple adjustment approach is to draw missing values from a truncated normal distribution to preserve the range of imputed values. A popular robust imputation method is the predictive mean matching (PMM). It has roots from the hot-deck imputation and replaces missing values from donors in nearest neighbor defined by some distance functions. PMM can ensure the imputed values fall in within the range defined by observed data. Perhaps the most important principle in multiple imputation is the inclusive imputation strategy. That is, we strive to include as much information as possible from observed data and relevant information in the imputation. Using the inclusive imputation strategy can be beneficial for estimates (i.e., with less bias and increased precision compared with the restrictive imputation strategy). One robust imputation option is the dual modeling strategy which uses information from both the outcome model and propensity score model.