Physical complexity of symbolic sequences
Abstract
A practical measure for the complexity of sequences of symbols (“strings”) is introduced that is rooted in automata theory but avoids the problems of Kolmogorov-Chaitin complexity. This physical complexity can be estimated for ensembles of sequences, for which it reverts to the difference between the maximal entropy of the ensemble and the actual entropy given the specific environment within which the sequence is to be interpreted. Thus, the physical complexity measures the amount of information about the environment that is coded in the sequence, and is conditional on such an environment. In practice, an estimate of the complexity of a string can be obtained by counting the number of loci per string that are fixed in the ensemble, while the volatile positions represent, again with respect to the environment, randomness. We apply this measure to tRNA sequence data.
- Publication:
-
Physica D Nonlinear Phenomena
- Pub Date:
- March 2000
- DOI:
- 10.1016/S0167-2789(99)00179-7
- arXiv:
- arXiv:adap-org/9605002
- Bibcode:
- 2000PhyD..137...62A
- Keywords:
-
- Nonlinear Sciences - Adaptation and Self-Organizing Systems;
- Condensed Matter - Statistical Mechanics;
- Physics - Data Analysis;
- Statistics and Probability;
- Quantitative Biology - Genomics
- E-Print:
- 12 pages LaTeX2e, 3 postscript figures, uses elsart.cls. Substantially improved and clarified version, includes application to EMBL tRNA sequence data