Iterative DNA Coding Scheme With GC Balance and Run-Length Constraints Using a Greedy Algorithm
Abstract
In this paper, we propose a novel iterative encoding algorithm for DNA storage to satisfy both the GC balance and run-length constraints using a greedy algorithm. DNA strands with run-length more than three and the GC balance ratio far from 50\% are known to be prone to errors. The proposed encoding algorithm stores data at high information density with high flexibility of run-length at most $m$ and GC balance between $0.5\pm\alpha$ for arbitrary $m$ and $\alpha$. More importantly, we propose a novel mapping method to reduce the average bit error compared to the randomly generated mapping method, using a greedy algorithm. The proposed algorithm is implemented through iterative encoding, consisting of three main steps: randomization, M-ary mapping, and verification. The proposed algorithm has an information density of 1.8523 bits/nt in the case of $m=3$ and $\alpha=0.05$. Also, the proposed algorithm is robust to error propagation, since the average bit error caused by the one nt error is 2.3455 bits, which is reduced by $20.5\%$, compared to the randomized mapping.
- Publication:
-
arXiv e-prints
- Pub Date:
- March 2021
- DOI:
- 10.48550/arXiv.2103.03540
- arXiv:
- arXiv:2103.03540
- Bibcode:
- 2021arXiv210303540P
- Keywords:
-
- Computer Science - Information Theory;
- Electrical Engineering and Systems Science - Signal Processing
- E-Print:
- 19 pages