ABSTRACT

It is well known that full maximum likelihood estimation can become prohibitive for many models. For example, in the framework of a marginally specified odds ratio model (Lipsitz, Laird and Harrington 1991, Dale 1986, Molenberghs and Lesaffre 1994, Glonek and McCullagh 1995, Lang and Agresti 1994) for multivariate, clustered binary data, full maximum likelihood estimation is prohibitive, especially with large within-unit representation. Conditional models such as the Molenberghs and Ryan (1999) models, introduced in Section 4.2, are based on an exponential family model for multivariate binary data and exhibit a high flexibility to capture different patterns of non-linear dependencies of the marginal probabilities on the cluster size. Like most exponential family models, the Molenberghs and Ryan (1999) model enjoys well known properties, such as linearity of the log-likelihood in the minimal sufficient statistics, unimodality, etc. This implies a high numerical stability of iterative procedures to determine maximum likelihood estimators. In multivariate settings (with 3 or more outcomes), however, where the normalizing constant takes a complicated form, all of these advantages can be lost as this leads to excessive computational requirements. This is especially true for clusters of variable length, because the normalizing constant depends on the cluster size. Hence, alternative estimation methods, which do not require the explicit calculation of the normalizing constant, are in demand. In this chapter, we introduce the pseudo-likelihood estimation method.

Strictly speaking this is a non-likelihood method. The principal idea is to replace a numerically challenging joint density by a simpler function that is a suitable product of ratios of likelihoods of subsets of the variables. For example, when a joint density contains a computationally intractable normalizing constant, one might calculate a suitable product of conditional densities which does not involve such a complicated function. A bivariate distribution

f(y1, y2), for example, can be replaced by the product of both conditionals f(y1|y2)f(y2|y1). While the method achieves important computational economies by changing the method of estimation, it does not affect model interpretation. Model parameters can be chosen in the same way as with full likelihood and retain their meaning. This method converges quickly with only minor efficiency losses, especially for a range of realistic parameter settings.