Exact asymptotic results for the Bernoulli matching model of sequence alignment
Abstract
Finding analytically the statistics of the longest common subsequence (LCS) of a pair of random sequences drawn from c alphabets is a challenging problem in computational evolutionary biology. We present exact asymptotic results for the distribution of the LCS in a simpler, yet nontrivial, variant of the original model called the Bernoulli matching (BM) model. We show that in the BM model, for all c , the distribution of the asymptotic length of the LCS, suitably scaled, is identical to the Tracy-Widom distribution of the largest eigenvalue of a random matrix whose entries are drawn from a Gaussian unitary ensemble.
- Publication:
-
Physical Review E
- Pub Date:
- August 2005
- DOI:
- 10.1103/PhysRevE.72.020901
- arXiv:
- arXiv:q-bio/0410012
- Bibcode:
- 2005PhRvE..72b0901M
- Keywords:
-
- 87.10.+e;
- 02.50.-r;
- 05.40.-a;
- 87.15.Cc;
- General theory and mathematical aspects;
- Probability theory stochastic processes and statistics;
- Fluctuation phenomena random processes noise and Brownian motion;
- Folding and sequence analysis;
- Genomics;
- Statistical Mechanics;
- Statistics
- E-Print:
- 4 pages Revtex, 2 .eps figures included