ABSTRACT
Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, USA
CONTENTS
12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 12.2 Independent sites models and summary statistics . . . . . . . . . . . . . . . 250
12.2.1 Likelihood inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 12.2.2 The EM algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 12.2.3 Bayesian inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 12.2.4 Conditional means on a phylogeny . . . . . . . . . . . . . . . . . . . . . . 252 12.2.5 Endpoint-conditioned summary statistics from
uniformization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
12.3 Dependent-site models and Markov chain Monte Carlo . . . . . . . . . 260 12.3.1 Gibbs sampling with context dependence . . . . . . . . . . . . . . . 262 12.3.2 Path sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
12.3.2.1 Rejection sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 12.3.2.2 Direct sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 12.3.2.3 Uniformization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 12.3.2.4 Bisectioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
12.3.3 Metropolis-Hastings algorithm with dependence . . . . . . . . 271 12.4 Future directions for sequence paths with dependence models . . 273
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
Algorithms, and
While some probabilistic models of DNA or protein sequence change are not based on an instantaneous rate matrix (e.g., Barry and Hartigan (1987)), most are. With an instantaneous rate matrix, there is an opportunity to go beyond the sequences that begin and end a branch on a phylogenetic tree — inferences can be made about the sequence changes that happened between these endpoints. At the most detailed level, inferences would be about which changes transformed the beginning sequence into the ending one and about exactly when these changes occurred. At a less detailed level, various summary statistics about the evolutionary trajectory from the beginning to ending of the branch might be of interest. A variety of techniques are available for making inferences about evolutionary trajectories conditional upon the endpoints of a branch and one objective of this chapter is to introduce them. To parallel the “Brownian bridge” that results when the endpoints of a Brownian motion process are conditioned upon, an endpoint-conditioned Markov process is known as a Markov bridge (Al-Hussaini and Elliot, 1989). This chapter is not intended to be comprehensive regarding inference techniques for Markov bridges. Instead, the focus is on endpoint-conditioning with Markov models for molecular sequence evolution.