chapter  16
34 Pages

Selection of Covariates

If a multiple logistic regression model is fitted to this data set, we obtain the following output:

variable beta SE 95%CI p-value height 0.026 0.026 [-0.025,0.077] 0.319 weight 0.030 0.018 [-0.006,0.065] 0.098

Both height and weight fail to be significant, and the confidence intervals of the effects are rather wide. In contrast, if we perform two single logistic regression analyses with no adjustment, we obtain significant effects for both covariates:

variable beta SE 95%CI p-value weight 0.042 0.014 [0.015,0.068] 0.002

variable beta SE 95%CI p-value height 0.055 0.019 [0.017,0.093] 0.005

There is nothing strange about this distinct difference between the two analyses. It just reflects what we have already seen in our visual inspection of Figure 16.1. We can clearly see an association of Y with both body height and body weight, but it is hard to separate the two effects from each other. Remember that in a multiple regression model, we investigate the effect of one covariate keeping the values of the other covariates fixed. This means in the example here that the effect of height is investigated by selecting any body weight w and then studying how the frequency of Y = 1 increases with increasing height. Due to the high correlation between height and weight, there is, however, only a limited variation of height for any given, fixed weight, as visualised in Figure 16.2. So this makes it a very difficult task to assess the effect of height adjusted for the effect of weight. (We have used this argument in a more general context already in Section 14.3.)

So we learn from this example that in using multiple regression we can easily ask questions which are too difficult to answer, and hence lead to insignificant results. And this is especially the case if covariates are highly correlated. Moreover, we learn that it can be wise to present in the analysis of a data set the results of several regression analyses. If the results of the example in this section are to be published and if only the results of the multiple regression model with the adjusted effect estimates are presented, the reader may get the wrong impression that neither body weight nor body height are related to Y . This danger can be avoided by presenting in addition the results of the two simple logistic models with no adjustment.