The Impact of Incomplete Covariate Data
On the other hand the MAR assumption allows that the occurrence of missing values depends on other, measured covariates or the outcome. For example, if we know that old subjects refuse more often to answer questions about sexual activity than young subjects, and age is one of the covariates in the regression model, then the missing values in the variable sexual activity may be still occur at random, if they are not related to the true answer. Missing dependent on X (MDX) This assumption requires that the occurrence of missing values in any covariate may be related to the true value of this covariate or the value of any other observed covariate or to the true value of any other covariate with a missing value in this subject, but it is not allowed to depend on the value of the outcome variable Y . This assumption is typically satisfied in prospective studies, in which data on all covariates are collected prior to the measurement of Y and also prior to the events that are responsible for the final measurement of Y . It is typically more questionable in retrospective studies, in which data on X is collected when the value of Y is already known. In particular, in case control studies, the MDX assumption is often highly questionable, as the diseased cases will remember their history differently compared to the healthy controls, or-when the cases are already dead-we have to collect data on X using relatives or patient records. Consequently, we have different missing rates in cases and controls, and hence the MDX assumption is violated (but MAR may still hold!) It is also likely that there is not only a quantitative difference between cases and controls, but also a qualitative one. For example, healthy controls have little reason to refuse an answer on their smoking habits even if they smoke a lot, so that missing values in this subgroup may be regarded as occurring at random. However, heavy smokers in the diseased cases may feel guilty and hence prefer to refuse to answer. Remark: There are (rare) situations in which, also in a prospective study, the occurrence of missing values carries information on the value of the outcome. This can happen, if both the missing values and the outcome are related to some latent variables like attitude to one’s own disease. Patients who have give up themselves may be more likely to refuse to answer questions and have simultaneously a worse prognosis than patients who are eager to overcome their disease.