ABSTRACT

The major criticism of simple imputation methods is the underestimation of the variance (see previous chapter). Multiple imputation [Rubin and Schenker, 1986, Rubin, 1987] rectifies this problem by incorporating both the variability of the HRQoL measure and the uncertainty about the missing observations. Multiple imputation of missing values will be worth the effort only if there is a substantial benefit that cannot be obtained using methods that assume that missing data is ignorable such as maximum likelihood for the analysis of incomplete data (Chapter 3 and 4). As mentioned in the previous chapter, this requires auxiliary information, such as assessments by other observers (caregivers) or clinical outcomes that are strongly correlated with the HRQoL measure. The previous comments about the complexities of actually implementing an imputation scheme when the trial involves both longitudinal data and multiple measures is even more relevant. The following quote summarizes the concern:

... multiple imputation is not a panacea. Although it is a powerful and useful tool applicable to many missing data settings, if not used carefully it is potentially dangerous. The existence of software that facilitates its use requires the analyst to be careful about the verification of assumptions, the robustness of imputation models and the appropriateness of inferences. For more complicated models (e.g., longitudinal or clustered data) this is even more important. (Norton and Lipsitz [2001])

The basic strategy of multiple imputation is to impute 3 to 20 sets of values for the missing data. Each set of data is then analyzed using complete data methods and the results of the analyses are then combined. The general

Clinical

Step 1: Selection of the Imputation Procedure

Although the least technical, selection of an appropriate imputation procedure is the most difficult and the most critical step. There are a variety of implicit and explicit methods that can be used for multiple imputation [Rubin and Schenker, 1986]. Explicit methods generally utilize regression models whereas implicit methods utilize sampling techniques. Four specific examples of these strategies are described in the subsequent sections.