Coding over Sets for DNA Storage
Abstract
In this paper we study error-correcting codes for the storage of data in synthetic deoxyribonucleic acid (DNA). We investigate a storage model where a data set is represented by an unordered set of $M$ sequences, each of length $L$. Errors within that model are a loss of whole sequences and point errors inside the sequences, such as insertions, deletions and substitutions. We derive Gilbert-Varshamov lower bounds and sphere packing upper bounds on achievable cardinalities of error-correcting codes within this storage model. We further propose explicit code constructions than can correct errors in such a storage system that can be encoded and decoded efficiently. Comparing the sizes of these codes to the upper bounds, we show that many of the constructions are close to optimal.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2018
- DOI:
- 10.48550/arXiv.1812.02936
- arXiv:
- arXiv:1812.02936
- Bibcode:
- 2018arXiv181202936L
- Keywords:
-
- Computer Science - Information Theory;
- 94B60
- E-Print:
- 20 pages