Species Trees are Recoverable from Unrooted Gene Tree Topologies Under a Constant Rate of Horizontal Gene Transfer
Abstract
Reconstructing the tree of life from molecular sequences is a fundamental problem in computational biology. Modern data sets often contain a large number of genes, which can complicate the reconstruction problem due to the fact that different genes may undergo different evolutionary histories. This is the case in particular in the presence of horizontal genetic transfer (HGT), where a gene is inherited from a distant species rather than an immediate ancestor. Such an event produces a gene tree which is distinct from, but related to, the species phylogeny. In previous work, a natural stochastic models of HGT was introduced and studied. It was shown, both in simulation and theoretical studies, that a species phylogeny can be reconstructed from gene trees despite surprisingly high rates of HGT under this model. Rigorous lower and upper bounds on this achievable rate were also obtained, but a large gap remained. Here we close this gap, up to a constant. Specifically we show that a species phylogeny can be reconstructed correctly from gene trees even when, on each gene, each edge of the species tree has a constant probability of being the location of an HGT event. Our new reconstruction algorithm, which relies only on unrooted gene tree topologies, builds the tree recursively from the leaves and runs in polynomial time. We also provide a matching bound in the negative direction (up to a constant) and extend our results to some cases where gene trees are not perfectly known.
- Publication:
-
arXiv e-prints
- Pub Date:
- August 2015
- DOI:
- 10.48550/arXiv.1508.01962
- arXiv:
- arXiv:1508.01962
- Bibcode:
- 2015arXiv150801962D
- Keywords:
-
- Mathematics - Probability;
- Computer Science - Computational Engineering;
- Finance;
- and Science;
- Quantitative Biology - Populations and Evolution
- E-Print:
- Submitted. Conference version published as: Daskalakis, Constantinos, and Sebastien Roch. "Species trees from gene trees despite a high rate of lateral genetic transfer: A tight bound." Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, 2016