Modeling the distribution of distance data in Euclidean space
Abstract
Phylogenetic inference-the derivation of a hypothesis for the common evolutionary history of a group of species- is an active area of research at the intersection of biology, computer science, mathematics, and statistics. One assumes the data contains a phylogenetic signal that will be recovered with varying accuracy due to the quality of the method used, and the quality of the data. The input for distance-based inference methods is an element of a Euclidean space with coordinates indexed by the pairs of organisms. For several algorithms there exists a subdivision of this space into polyhedral cones such that inputs in the same cone return the same tree topology. The geometry of these cones has been used to analyze the inference algorithms. In this chapter, we model how input data points drawn from DNA sequences are distributed throughout Euclidean space in relation to the space of tree metrics, which in turn can also be described as a collection of polyhedral cones.
- Publication:
-
arXiv e-prints
- Pub Date:
- June 2016
- DOI:
- 10.48550/arXiv.1606.06146
- arXiv:
- arXiv:1606.06146
- Bibcode:
- 2016arXiv160606146D
- Keywords:
-
- Quantitative Biology - Populations and Evolution
- E-Print:
- To appear in the AMS Contemporary Mathematics Series