Entropy estimation of symbol sequences
Abstract
We discuss algorithms for estimating the Shannon entropy h of finite symbol sequences with long range correlations. In particular, we consider algorithms which estimate h from the code lengths produced by some compression algorithm. Our interest is in describing their convergence with sequence length, assuming no limits for the space and time complexities of the compression algorithms. A scaling law is proposed for extrapolation from finite sample lengths. This is applied to sequences of dynamical systems in nontrivial chaotic regimes, a 1D cellular automaton, and to written English texts.
 Publication:

Chaos
 Pub Date:
 September 1996
 DOI:
 10.1063/1.166191
 arXiv:
 arXiv:condmat/0203436
 Bibcode:
 1996Chaos...6..414S
 Keywords:

 Condensed Matter  Statistical Mechanics;
 Computer Science  Computation and Language;
 Computer Science  Information Theory;
 Physics  Data Analysis;
 Statistics and Probability;
 Statistics  Machine Learning
 EPrint:
 14 pages, 13 figures, 2 tables