Gradients do grow on trees: a linear-time ${\cal O}\hspace{-0.2em}\left( N \right)$-dimensional gradient for statistical phylogenetics

doi:10.48550/arXiv.1905.12146

Gradients do grow on trees: a linear-time ${\cal O}\hspace{-0.2em}\left( N \right)$-dimensional gradient for statistical phylogenetics

Calculation of the log-likelihood stands as the computational bottleneck for many statistical phylogenetic algorithms. Even worse is its gradient evaluation, often used to target regions of high probability. Order ${\cal O}\hspace{-0.2em}\left( N \right)$-dimensional gradient calculations based on the standard pruning algorithm require ${\cal O}\hspace{-0.2em}\left( N^2 \right)$ operations where N is the number of sampled molecular sequences. With the advent of high-throughput sequencing, recent phylogenetic studies have analyzed hundreds to thousands of sequences, with an apparent trend towards even larger data sets as a result of advancing technology. Such large-scale analyses challenge phylogenetic reconstruction by requiring inference on larger sets of process parameters to model the increasing data heterogeneity. To make this tractable, we present a linear-time algorithm for ${\cal O}\hspace{-0.2em}\left( N \right)$-dimensional gradient evaluation and apply it to general continuous-time Markov processes of sequence substitution on a phylogenetic tree without a need to assume either stationarity or reversibility. We apply this approach to learn the branch-specific evolutionary rates of three pathogenic viruses: West Nile virus, Dengue virus and Lassa virus. Our proposed algorithm significantly improves inference efficiency with a 126- to 234-fold increase in maximum-likelihood optimization and a 16- to 33-fold computational performance increase in a Bayesian framework.

Publication:

arXiv e-prints

Pub Date:

May 2019

DOI:

10.48550/arXiv.1905.12146

arXiv:

arXiv:1905.12146

Bibcode:

2019arXiv190512146J

Keywords:

Statistics - Computation;
Quantitative Biology - Populations and Evolution;
Statistics - Methodology

NASA/ADS

Gradients do grow on trees: a linear-time ${\cal O}\hspace{-0.2em}\left( N \right)$-dimensional gradient for statistical phylogenetics

Abstract