ABSTRACT

Department of Scientific Computing, Florida State University, Tallahassee, Florida, USA1

Peter Beerli

Department of Scientific Computing, Florida State University, Tallahassee, Florida, USA

CONTENTS

9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 9.2 Bayesian inference of independent loci . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

9.2.1 What K represents qualitatively . . . . . . . . . . . . . . . . . . . . . . . . 193 9.2.2 Calculating K . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194

9.3 Model comparison using our independent marginal likelihood sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

9.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

Bayesian inference has changed the study of phylogenetics and population genetics. Just a few years ago, researchers using probabilistic methods had to justify using such methods rather than parsimony-based tree inferences in phylogenetics and allele-frequency-based methods in population genetics. Molecular phylogenetics seems to be more progressive than population genetics in accepting Bayesian or maximum likelihood methods because today it is common to find phylogenetic reports that only employ probabilistic methods; in contrast, population genetics reports that do not report summary statistics alongside probabilistic methods are rare. We assume this is mostly based on the fact that in phylogenetics usually only one marker, a long stretch of DNA, was collected from many different species; this made it rather simple

Algorithms, and

to develop statistical methods and focus on the mutation model that changes the sequence data over evolutionary time, leading in turn to development of a large number of different mutation models and variants. These models considered, for example, site rate variation and coding versus non-coding sequences. Population genetics, on the other hand, focused on allele frequencies among many sampling locations of a single species. Once sequencing was feasible for many individuals, however, it became obvious that sequencing the same stretch of DNA from many individuals in a single population contributes little additional information because most individuals are identical by descent. The allozyme era of the ’80s revealed, however, that populations show many differences if we are willing to look at many loci. This led to a search for cheap markers, such as microsatellites and single nucleotide polymorphisms (SNPs). Recently, studies on non-model organisms that use many stretches of DNA sequence have emerged.