ABSTRACT

This chapter provides an overview of the use of Markov chain Monte Carlo (MCMC) methods in the analysis of data observed for multiple genetic loci on members of extended pedigrees in which there are many missing data. Rather than on the details of the MCMC samplingmethods, our focus is first on the complex structure of these data that necessitates MCMC methods, and second on the use of Monte Carlo realizations of latent variables in statistical inference in this area. MCMC should be a weapon of last resort, when exact computation and other Monte

Carlo methods fail. WhenMCMC is needed, there are two prerequisites for its efficient use in complex stochastic systems. The first is a consideration of the conditional independence structure of the data observations and latent variables, and a choice of latent variable structure that will facilitate computation and sampling. While unnecessary augmentation of the latent variable space is clearly disadvantageous, there are classic caseswhere augmentation of the space greatly improves efficiency (Besag and Green, 1993). Second, and related, it is important to consider what parts of a computationmay be performed exactly.Where a partial exact computation is feasible, this may be used to resample jointly subsets of the latent variables, and hence improveMCMCperformance.Additionally, partial exact computation may permit the use of Rao-Blackwellized estimators (Gelfand and Smith, 1990), improving efficiency in the use of sampled realizations. Thus, in Section 13.3we consider the structures and exact computational algorithms that will complement MCMC approaches. As geneticmarker data on observable individuals increase, and the traits requiring analy-

sis becomegeneticallymore complex, the challengesboth for exact computationandMCMC methods increase also. In Section 13.4, we describe MCMC samplers of genetic latent variables that have evolved from the single-site genotypic updating samplers of Sheehan (2000) to the most recent multiple-meoisis and locus sampling of inheritance patterns of Tong and Thompson (2008). The separation of the analysis of trait data from the MCMC sampling of latent variables conditional on genetic marker data was first proposed by Lange and Sobel (1991). With the increasing complexity of models for trait data, this becomes the approach of choice, and in Section 13.5 we discuss the sampling of latent inheritance patterns conditional only on dense marker data. In some cases, the model on which sampling is based is too simple to even approximate reality. Then, importance sampling reweighting becomes a key tool in improving the usefulness of this approach. Also in the arena of marker data based analyses is the question of genetic map estimation (Section 13.5.2).