ABSTRACT

Sample size calculation and estimation of power are crucial first steps in planning

a new study. When we plan longitudinal analyses for a particular study design, we

might determine the required number of subjects to detect a meaningful difference

between groups, for assumed values for the type 1 error, power, and number of mea-

surements per subject. For cross-sectional clustered data, we might determine the

total number of clusters for an assumed number of measurements per cluster. We

sometimes refer to the sample size as the required number of subjects, or clusters,

although we should keep in mind that the total sample size is the number of sub-

jects multiplied by the assumed number of measurements per subject, or the number

of clusters multiplied by the assumed number of measurements per cluster. For ex-

ample, in a study that collects information on subjects at baseline and at 6 and 12

months post baseline, if we determine that we need 30 subjects, this means that we

will collect a total of 30× 3 = 90 measurements. Methods for power and sample size are well developed for independent measure-

ments; for example, release 13.0 of Stata includes an extensive suite of commands

for assessment of sample size and power (Stata Press, 2013). When planning a study

that will yield longitudinal or clustered data, the standard approaches for indepen-

dent measurements must be amended to account for the correlation within a subject,

or cluster. This chapter describes approaches for correlated data that are equally ap-

plicable for QLS and GEE and are based on findings by Shih (1997), who provided

formulae that utilize the asymptotic covariance matrix of √ m(β̂ −β ), that was pro-

vided in Equation (2.23). Sample size formulae were also provided in Diggle et al.

(2002). If we assume that the true correlation structure Ti(ρ) for subject i is correctly specified and is equal to Ri(α), then Equation (2.23) simplifies as follows: