Large-scale simulation of RNA macroevolution by an energy-dependent fitness model
Abstract
Simulated nucleotide sequences are widely used in theoretical and empirical molecular evolution studies. Conventional simulators generally use fixed parameter time-homogeneous Markov model for sequence evolution. In this work, we use the folding free energy of the secondary structure of an RNA as a proxy for its phenotypic fitness, and simulate RNA macroevolution by a mutation-selection population genetics model. Because the two-step process is conditioned on an RNA and its mutant ensemble, we no longer have a global substitution matrix, nor do we explicitly assume any for this inhomogeneous stochastic process. After introducing the base model of RNA evolution, we outline the heuristic implementation algorithm and several model improvements. We then discuss the calibration of the model parameters and demonstrate that in phylogeny reconstruction with both the parsimony method and the likelihood method, the sequences generated by our simulator, rnasim, have greater statistical complexity than those by two standard simulators, ROSE and Seq-Gen, and are close to empirical sequences.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2009
- DOI:
- 10.48550/arXiv.0912.2326
- arXiv:
- arXiv:0912.2326
- Bibcode:
- 2009arXiv0912.2326G
- Keywords:
-
- Quantitative Biology - Populations and Evolution;
- Quantitative Biology - Biomolecules