Evidence for growth of microbial genomes by short segmental duplications
Abstract
We show that textual analysis of microbial genomes reveal telling footprints of the early evolution of the genomes. The frequencies of word occurrence of random DNA sequences considered as texts in their four nucleotides are expected to obey Poisson distributions. It is noticed that for words less than nine letters the average width of the distributions for complete microbial genomes is many times that of a Poisson distribution. We interpret this phenomenon as follows: the genome is a large system that possesses the statistical characteristics of a much smaller ``random'' system, and certain textual statistical properties of genomes we now see are remnants of those of their ancestral genomes, which were much shorter than the genomes are now. This interpretation suggests a simple biologically plausible model for the growth of genomes: the genome first grows randomly to an initial length of approximately one thousand nucleotides (1k nt), or about one thousandth of its final length, thereafter mainly grows by random segmental duplication. We show that using duplicated segments averaging around 25 nt, the model sequences generated possess statistical properties characteristic of present day genomes. Both the initial length and the duplicated segment length support an RNA world at the time duplication began. Random segmental duplication would greatly enhance the ability of a genome to use its hard-to-acquire codes repeatedly, and a genome that practiced it would have evolved enormously faster than those that did not.
- Publication:
-
arXiv e-prints
- Pub Date:
- February 2003
- DOI:
- arXiv:
- arXiv:physics/0302031
- Bibcode:
- 2003physics...2031H
- Keywords:
-
- Biological Physics;
- Genomics
- E-Print:
- 5 pages, 1 table, 2 figures