On the Maximum Number of Non-Confusable Strings Evolving Under Short Tandem Duplications
Abstract
The set of all $ q $-ary strings that do not contain repeated substrings of length $ \leqslant\! 3 $ (i.e., that do not contain substrings of the form $ a a $, $ a b a b $, and $ a b c a b c $) constitutes a code correcting an arbitrary number of tandem-duplication mutations of length $ \leqslant\! 3 $. In other words, any two such strings are non-confusable in the sense that they cannot produce the same string while evolving under tandem duplications of length $ \leqslant\! 3 $. We demonstrate that this code is asymptotically optimal in terms of rate, meaning that it represents the largest set of non-confusable strings up to subexponential factors. This result settles the zero-error capacity problem for the last remaining case of tandem-duplication channels satisfying the "root-uniqueness" property.
- Publication:
-
arXiv e-prints
- Pub Date:
- November 2019
- DOI:
- 10.48550/arXiv.1911.06561
- arXiv:
- arXiv:1911.06561
- Bibcode:
- 2019arXiv191106561K
- Keywords:
-
- Computer Science - Information Theory;
- Computer Science - Discrete Mathematics;
- 94A24;
- 94A40;
- 94B25;
- 94B50;
- 68R15
- E-Print:
- 10 pages