ABSTRACT

Now that we have some ideas for collecting information on disease and exposure, we had better do some homework on what to do with the data, which probably cost a lot of money to collect! Specifically, we turn to the assessment of whether an observed association of D and E in a sample of data reflects a population in which D and E are truly associated or may have arisen from the vagaries of random variation. Before we spend too much effort trying to interpret a relationship between marital status and infant birthweight, we want to convince ourselves that we are not simply tilting at windmills, that is, making a fuss about these variables’ apparent association if it could be just due to the chance variation of sampling. In the language of hypothesis testing, a suitable null hypothesis to address this question is that D and E are independent. This, of course, can also be stated in terms of any of the measures of association introduced in Chapter 4:

H0 : D and E are independent ⇔ RR = 1 ⇔ OR = 1 ⇔ ER = 0 It will be convenient to summarize the sampled data by a 2 × 2 table, as in Table 6.1, and refer to cell entries using the given symbols. Thus, a refers to the number of exposed individuals who developed disease. For this discussion, we consider only the traditional case-control design, although we say more about the analysis of nested case-control and case-cohort designs in the next chapter.