ABSTRACT

Eduardo Garc´ıa-Portugue´s,1 Michael Golden,2 Michael Sørensen3

1Department of Statistics, Carlos III University of Madrid (Spain) 2Department of Statistics, University of Oxford (UK) 3Department of Mathematical Sciences, University of Copenhagen (Denmark) 4Department of Mathematics, University of Leeds (UK)

Kanti V. Mardia,2,4 Thomas Hamelryck,5,6 Jotun Hein2

5Department of Biology, University of Copenhagen (Denmark) 6Department of Computer Science, University of Copenhagen (Denmark)

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.1.1 Protein Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.1.2 Protein Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.1.3 Toward a Generative Model of Protein Evolution . . . . . . 66

4.2 Toroidal Diffusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.2.1 Toroidal Ornstein-Uhlenbeck Analogues . . . . . . . . . . . . . . . . 70 4.2.2 Estimation for Toroidal Diffusions . . . . . . . . . . . . . . . . . . . . . . 72 4.2.3 Empirical Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

4.3 ETDBN: An Evolutionary Model for Protein Pairs . . . . . . . . . . . . . 77 4.3.1 Hidden Markov Model Structure . . . . . . . . . . . . . . . . . . . . . . . . 77 4.3.2 Site-Classes: Constant Evolution and Jump Events . . . . 80 4.3.3 Model Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.3.4 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

4.4 Case Study: Detection of a Novel Evolutionary Motif . . . . . . . . . . . 87 4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

Methods and

Toroidal diffusions, this is, continuous-time Markovian processes on the torus, are useful statistical tools for modelling the evolution of a protein’s backbone throughout its dihedral angles representation. This chapter reviews a class of time-reversible ergodic diffusions, which can be regarded as the toroidal analogues of the celebrated Ornstein-Uhlenbeck process, and presents their application to the construction of an evolutionary model for pairs of related proteins that aims to provide new insights into the relationship between protein sequence and structure evolution.