Information-theoretic View of Sequence Organization in a Genome
Abstract
Sequence organizations are viewed from two points: one is from informational redundancy or informational correlation (IC) and another is from k-mer frequency statistics. Two problems are investigated. The first is how the ICs exceed the fluctuation bound and the order emerges from fluctuation in a genome when the sequence length attains some critical value. We demonstrated that the transition from fluctuation to order takes place at about sequence length 200-300 thousands bases for human and E coli genome. It means that the life emerges from a region between macroscopic and microscopic. The second is about the statistical law of the k-mer organization in a genome under the evolutionary pressure and functional selection. We deduced a sum rule Q(k,N) on the k-mer frequency deviations from the randomness in a N-long sequence of genome and deduced the relations of Q(k,N) with k and N. We found that Q(k,N) increases with length N at a constant rate for most genome sequences and demonstrated that when the functional selection of k-mers is accumulated to some critical value the ordering takes place. An important finding is the sum rule correlated with the evolutionary complexity of the genome.
- Publication:
-
arXiv e-prints
- Pub Date:
- April 2010
- DOI:
- 10.48550/arXiv.1004.3843
- arXiv:
- arXiv:1004.3843
- Bibcode:
- 2010arXiv1004.3843L
- Keywords:
-
- Quantitative Biology - Genomics
- E-Print:
- 19 pages, 4 figures