ABSTRACT

Another situation in which there might be correlated data is when repeated measurements are made; for example, the different weights of a single child at different visits to his or her pediatrician. There are various alternatives for analyzing these data, and which alternatives to use will depend on the objectives of the investigator. One alternative is to analyze the independent measurements separately, which results in a loss of information and statistical power. For example, assume the following data with regard to 20 different children, each one visiting his or her pediatrician three different times, being weighed at each visit, and information on whether they practice a sport on a regular basis (1 = yes vs. 0 = no):

To perform the analysis with independent measurements, assuming that the objective is to compare the average weight by type of sport, we could carry out a simple linear regression analysis of the weight of each child, using the following command lines at each visit:

For the first visit:

Output

The results show a significant effect of the predictor sport on mean weight at visit 1 (P-value = .001). The estimated regression coefficient for the predictor sport is the difference between the mean weights by sport in visit 1. The following command line can be used to compute the observed mean weight by sport:

Output

The difference in the mean weights at visit 1 is −3.1, so the children who practice regularly a sport weigh less, on average, than those who do not practice regularly a sport. To explore the differences in mean weight in each visit, a line can be drawn between the estimated weights from a linear regression model by sport. For example, in the first visit the following Stata commands for visualizing this line can be used to create Figure 11.2:

We repeat the previous steps for the second visit:

Output

Output

The results also show a significant effect of the predictor sport on mean weight at visit 2 (P-value = .007). The difference in the mean weights at the second visit is −3.3; children who practice regularly a sport weight less, on average, than those who do not practice regularly a sport. To draw the estimated weight by sport at visit 2, the following Stata commands are used to create Figure 11.3:

The Stata commands on the third visit are:

Output

Output

The results do not show a significant effect of the predictor sport on mean weight at visit 3 (P-value > .1). There is no difference in the mean weights at the third visit. To draw the estimated weight by sport on the third visit, the following Stata commands are used to create Figure 11.4:

11.2 Mixed Models The other alternative for analyzing correlated data is to use mixed models or multilevel models that allow us to correct the statistical relationship due to the potential correlation between measurements. For example, the simplest scheme of correlation is the repeated measures study; the same subject is measured several times, as illustrated in Figure 11.5.