ABSTRACT

Beijing Institute of Genomics, Chinese Academy of Science, Beijing, PR China

Ziheng Yang

Department of Biology, University College London, London, United Kingdom

CONTENTS

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.1 The Bayesian paradigm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.2 Objective versus subjective priors . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Estimation of distance between two sequences . . . . . . . . . . . . . . . . . . 8 2.2.1 The maximum likelihood estimate (MLE) . . . . . . . . . . . . . . 8 2.2.2 Uniform or flat priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.3 The Jeffreys priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.4 Reference priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3 Priors on model parameters in Bayesian phylogenetics . . . . . . . . . . 11 2.3.1 Priors on branch lengths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.2 Priors on parameters in substitution models . . . . . . . . . . . . 15 2.3.3 Priors for heterogeneous substitution rates among sites

and over time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3.4 Partition and mixture models for analysis of large

genomic datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.4 Priors on the tree topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.4.1 Prior on rooted trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.4.2 Priors on unrooted trees and impossibility of equal clade

probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.4.3 Prior on trees and high posterior probabilities . . . . . . . . . . 21

2.5 Priors on times and rates for estimation of divergence times . . . 22 2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Algorithms, and

2.1.1 The Bayesian paradigm

The key feature of Bayesian statistics is its use of probability distributions to represent the uncertainty in the parameters of the model. The distribution of the parameters before the collection and analysis of the data is called the prior, while the distribution incorporating the information in the data is called the posterior. In contrast, in Frequentist statistics, a parameter is an unknown but fixed constant and cannot have a distribution. When prior knowledge of the parameters is available, Bayesian analysis provides a natural way to incorporate such information. When no such information is available, a vague or diffuse prior has to be used for Bayesian inference to proceed. The posterior distribution combines the information from the prior and the information from the data and forms the basis for all Bayesian inferences.