How much can evolved characters tell us about the tree that generated them?
Abstract
In this paper we review some recent results that shed light on a fundamental question in molecular systematics: how much phylogenetic `signal' can we expect from characters that have evolved under some Markov process? There are many sides to this question and we begin by describing some explicit bounds on the probability of correctly reconstructing an ancestral state from the states observed at the tips. We show how this bound sets upper limits on the probability of tree reconstruction from aligned sequences, and we provide some new extensions that allow site-to-site rate variation or a covarion mechanism. We then explore the relationship between the number of sites required for accurate tree reconstruction and other model parameters - such as the number of species, and substitution probabilities, and we describe a phase transition that occurs when substitution probabilities exceed a critical value. In the remainder of this paper we turn to models of character evolution where the state space is assumed to be either infinite or very large. These models have some relevance to certain types of genomic data (such as gene order) and here we again investigate how many characters are required for accurate tree reconstruction.
- Publication:
-
arXiv e-prints
- Pub Date:
- June 2004
- DOI:
- arXiv:
- arXiv:q-bio/0406048
- Bibcode:
- 2004q.bio.....6048M
- Keywords:
-
- Quantitative Biology - Populations and Evolution;
- Mathematics - Statistics Theory