ABSTRACT

Department of Statistics, University of Connecticut, Storrs, Connecticut, USA1

Ming-Hui Chen

Department of Statistics, University of Connecticut, Storrs, Connecticut, USA

Lynn Kuo

Department of Statistics, University of Connecticut, Storrs, Connecticut, USA

Paul O. Lewis

Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, Connecticut, USA

CONTENTS

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 6.2 Notation and definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

6.2.1 Consistency of VHM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 6.2.2 Consistency of VSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 6.2.3 Consistency of VIDR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

6.3 Empirical example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 6.3.1 Transformations used for VIDR . . . . . . . . . . . . . . . . . . . . . . . . . 122 6.3.2 Topology-specific marginal likelihood estimation . . . . . . . 122 6.3.3 Total marginal likelihood estimation . . . . . . . . . . . . . . . . . . . . 123

6.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 6.5 Funding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

Algorithms, and

Model selection in Bayesian phylogenetics, as in Bayesian statistics in general, often centers around the marginal likelihood, defined as the probability of the data given only the model, marginalized over all model parameters. Models with larger marginal likelihoods fit the data better and thus are preferred over models with a smaller marginal likelihood. Several methods have been used for estimating the marginal likelihood in phylogenetics, including the harmonic mean (HM) method (Newton and Raftery, 1994a), thermodynamic integration (TI) (Lartillot and Philippe, 2006a), the stepping-stone (SS) method (Xie et al., 2011), the generalized stepping-stone method (Fan et al., 2011) (see also Chapter 5), and, most recently, the inflated density ratio (IDR) method (Arima and Tardella, 2012) (see also Chapter 3). (In this chapter, the abbreviation SS will refer to generalized SS unless otherwise indicated.) Most of these approaches focus on estimating the marginal likelihood when the tree topology is fixed. Specifically, they estimate the marginal likelihood of tree T given model M , c(T |M) = f(y|T,M), where y denotes data. In principle, all of these methods can be used to also estimate the marginal likelihood of the model only (integrating over tree topology), c(M) = p(y|M), although to our knowledge these methods have not been proven to be consistent in this case. Here we prove that HM and SS are statistically consistent estimators of c(M), and generalize the IDR method for use in the variable topology case, proving that it also is a consistent estimator of c(M). The variable tree topology versions of HM, SS, and IDR we denote as VHM, VSS, and VIDR, respectively.