A Proof of Entropy Minimization for Outputs in Deletion Channels via Hidden Word Statistics
Abstract
From the output produced by a memoryless deletion channel from a uniformly random input of known length $n$, one obtains a posterior distribution on the channel input. The difference between the Shannon entropy of this distribution and that of the uniform prior measures the amount of information about the channel input which is conveyed by the output of length $m$, and it is natural to ask for which outputs this is extremized. This question was posed in a previous work, where it was conjectured on the basis of experimental data that the entropy of the posterior is minimized and maximized by the constant strings $\texttt{000}\ldots$ and $\texttt{111}\ldots$ and the alternating strings $\texttt{0101}\ldots$ and $\texttt{1010}\ldots$ respectively. In the present work we confirm the minimization conjecture in the asymptotic limit using results from hidden word statistics. We show how the analytic-combinatorial methods of Flajolet, Szpankowski and Vallée for dealing with the hidden pattern matching problem can be applied to resolve the case of fixed output length and $n\rightarrow\infty$, by obtaining estimates for the entropy in terms of the moments of the posterior distribution and establishing its minimization via a measure of autocorrelation.
- Publication:
-
arXiv e-prints
- Pub Date:
- July 2018
- DOI:
- arXiv:
- arXiv:1807.11609
- Bibcode:
- 2018arXiv180711609A
- Keywords:
-
- Computer Science - Information Theory;
- Computer Science - Discrete Mathematics
- E-Print:
- 11 pages, 2 figures