ABSTRACT

Generalized estimating equations play an important role in the analysis of repeated or clustered outcomes of a non-normally distributed type. In this work, it will be used, together with pseudo-likelihood methodology, as nonlikelihood based method for the analysis of clustered binary data. A comparison between both will be made in Chapter 6. Also, the use of generalized estimating equations will be illustrated in the contexts of individual-level covariates and combined continuous and discrete outcomes, in Chapters 13 and 14, respectively. Further applications of the GEE technology can be found in Section 9.2.6 and Chapter 11. When we are mainly interested in first order marginal mean parameters and

pairwise interactions, a full likelihood procedure can be replaced by quasilikelihood methods (McCullagh and Nelder 1989). In quasi-likelihood, the mean response is expressed as a parametric function of covariates; the variance is assumed to be a function of the mean up to possibly unknown scale parameters. Wedderburn (1974) first noted that likelihood and quasi-likelihood theories coincide for exponential families and that the quasi-likelihood “estimating equations” provide consistent estimates of the regression parameters β in any generalized linear model, even for choices of link and variance functions that do not correspond to exponential families. For clustered and repeated data, Liang and Zeger (1986) proposed so-called

generalized estimating equations (GEE or GEE1) which require only the correct specification of the univariate marginal distributions provided one is willing to adopt “working” assumptions about the association structure. They estimate the parameters associated with the expected value of an individual’s vector of binary responses and phrase the working assumptions about the association between pairs of outcomes in terms of marginal correlations. Prentice (1988) extended their results to allow joint estimation of probabil-

ities and pairwise correlations. Lipsitz, Laird and Harrington (1991) modified the estimating equations of Prentice (1988) to allow modeling of the association through marginal odds ratios rather than marginal correlations. When adopting GEE1 one does not use information of the association structure to estimate the main effect parameters. As a result, it can be shown that GEE1 yields consistent main effect estimators, even when the association structure is misspecified. However, severe misspecification may seriously affect the efficiency of the GEE1 estimators. In addition, GEE1 should be avoided when some scientific interest is placed on the association parameters. A second order extension of these estimating equations (GEE2) that include

the marginal pairwise association as well has been studied by Liang, Zeger and Qaqish (1992). They note that GEE2 is nearly fully efficient though bias may occur in the estimation of the main effect parameters when the association structure is misspecified. A variation to this theme, using conditional probability ideas, has been proposed by Carey, Zeger and Diggle (1993). It is referred to as alternating logistic regressions. In Section 5.1 we present general GEE theory, whereas several applications

and specializations to the case of clustered binary data are presented in Section 5.2.