ABSTRACT

Bayes’ rule determines the probability of an outcome given some conditional prior probability. In phylogenetics, it determines the probability of a tree given a model of evolution with set prior probabilities of events important to the model. Markov chain Monte Carlo (MCMC) simulation, linked to a specific likelihood model, is used to produce a posterior distribution of trees for use in Bayesian analysis. In a phylogenetic context, the model can be selected on the basis of any biologically meaningful partition. Our treatment of models in Bayesian analysis differs very little from the way we would handle models in a likelihood framework. Priors are currently best treated as “flat” distributions except for two important sets of priors, the Dirichlet distribution for state frequency priors and branch length priors. MCMC parameters are critical for a full understanding of the efficiency of the Bayesian analysis. Setting these parameters has a greater impact in large data sets. Increasing the number of generations the simulation is allowed to run leads to more complete exploration of tree space but with higher computational cost. Close observation of the behavior of the chains can indicate when a Bayesian analysis has been efficient and a run should be ended.