Multiple Imputation for Multivariate Missing Data: The Joint Modeling

Chapter

ABSTRACT

Missing data often occur on multiple variables that are typically related to each other in a dataset. We present the joint modeling (JM) strategy for imputing multiple incomplete variables. JM starts with positing a Bayesian joint model for multivariate data and then uses the data augmentation algorithm to impute the missing values. JM for monotone missing data is relatively straightforward. For data with general missingness patterns, we briefly discuss several JM strategies depending on the type of variables. These methods include multivariate normal models for continuous variables, log-linear models and latent variable models for categorical variables, general location models and latent variable models for a mixture of continuous and categorical variables. Imputation algorithms under these models are available in SAS PROC MI and R (e.g. R norm, R mix, R cat, and R JOMO). Examples on U.S. hospital performance data and BRFSS survey are given. More challenging problems can occur if the outcome and some covariates are missing in a targeted regression analysis. The JM strategy should include both the analysis model and covariate model. When the Bayesian joint model is complicated, we propose to use WinBUGS (or its more recent variant such as OpenBUGS and R NIMBLE) to conduct the imputation. Using WinBUGS can save users from deriving and coding the complicated posterior distributions by themselves. We provide WinBUGS examples on imputing incomplete interaction terms in a regression analysis and on imputing data from a normal mixture distribution.

Multiple Imputation for Multivariate Missing Data: The Joint Modeling Approach

ABSTRACT