Representing Information on DNA using Patterns Induced by Enzymatic Labeling
Abstract
Enzymatic DNA labeling is a powerful tool with applications in biochemistry, molecular biology, biotechnology, medical science, and genomic research. This paper contributes to the evolving field of DNA-based data storage by presenting a formal framework for modeling DNA labeling in strings, specifically tailored for data storage purposes. Our approach involves a known DNA molecule as a template for labeling, employing patterns induced by a set of designed labels to represent information. One hypothetical implementation can use CRISPR-Cas9 and gRNA reagents for labeling. Various aspects of the general labeling channel, including fixed-length labels, are explored, and upper bounds on the maximal size of the corresponding codes are given. The study includes the development of an efficient encoder-decoder pair that is proven optimal in terms of maximum code size under specific conditions.
- Publication:
-
arXiv e-prints
- Pub Date:
- May 2024
- DOI:
- 10.48550/arXiv.2405.08475
- arXiv:
- arXiv:2405.08475
- Bibcode:
- 2024arXiv240508475B
- Keywords:
-
- Computer Science - Information Theory
- E-Print:
- Accepted to The IEEE International Symposium on Information Theory (ISIT) 2024