Cluster Modelling for Disease Rate Mapping | 10

ABSTRACT

Statistical methods for analyzing spatial patterns of disease incidence or mortality have been of great interest over the past decade. To a large extent, the statistical approaches taken fall into two classes: cluster detection or disease mapping. In cluster detection, one typically adopts the hypothesis testing framework, testing the null hypothesis of a common disease rate across the study region against a “clustering” alternative (Whittemore et al. 1987, Kulldorﬀ and Nagarwalla 1995). In disease mapping, one typically uses Bayes or empirical Bayes methods to produce smoothed estimates of the cell-speciﬁc disease rates suitable for mapping (see, for example, Clayton and Kaldor (1987) and Besag et al. (1991)). In this paper, we describe a method for inference that simultaneously addresses both the cluster detection and the disease mapping problems. Many Bayesian approaches to analyzing spatial disease patterns focus on

mapping spatially smoothed disease rates (for example, Clayton and Kaldor (1987), Besag et al. (1991) and Waller et al. (1997a)). Mapping methods produce stable estimates for cell-speciﬁc rates by borrowing strength from neighboring cells. These are most useful for capturing gradual, regional changes in disease rates, and are less useful in detecting abrupt, localized changes indicative of hot spot clustering. The models proposed by Besag et al. (1991) and Waller et al. (1997a) incorporate both spatially structured (spatial correlation) and unstructured (extra-Poisson variation) heterogeneity in one model. The ability of these models to detect localized clusters is questionable, because they incorporate only a global clustering mechanism. In addition, typically, the spatially structured and unstructured components of the heterogeneity are not separately identiﬁed by the likelihood (Waller et al. 1997a). A few Bayesian approaches more directly address the disease cluster-

ing problem, including Lawson (1995), Lawson and Clark (1999b), and Gangnon and Clayton (2000). Lawson (1995) proposes a point process

model for detection of cluster locations when exact case (and control) locations are known. Lawson (2000) describes an extension of this model to incorporate both localized clustering and general spatial heterogeneity of disease rates. Lawson and Clark (1999b) describe the application of a point process clustering model to case count data through data augmentation. To apply their model, one imputes locations for each member of the population at risk, typically by assuming a uniform spatial distribution within each cell, to produce a point process. One then proposes a clustering model for the point process. Gangnon and Clayton (2000) propose a model for clustering using cell count data in which the study region is divided into several components: a large background area and a relatively small number of clusters where a common rate (or covariate-adjusted risk) is assumed within each component. Knorr-Held and Rasser (2000) and Denison and Holmes (2001) consider a

nonparametric Bayesian framework for modelling cell count data. Although superﬁcially similar to the Gangnon and Clayton (2001) model in that cells are grouped into components of constant risk, the models of Knorr-Held and Rasser (2000) and Denison and Holmes (2001) serve a very diﬀerent goal. In the models of Knorr-Held and Rasser (2000) and Denison and Holmes (2001), the components (or clusters) of cells primarily serve as a tool for estimating the underlying risk surface, not as parameters of direct interest. In the model of Gangnon and Clayton (2000), the location and composition of the cluster of cells is of primary interest. In Section 8.2, we describe the model for clustering originally proposed

by Gangnon and Clayton (2000). In Section 8.3, we review a randomized variant on a backwards selection algorithm useful for approximating the posterior distribution on cluster models. In Section 8.4, we illustrate the use of these modelling techniques through the construction of maps of the 1995 mortality rates from ﬁve cancers (breast, cervical, colon, lung and stomach) in the United States.