Improved constructions of secondary structure avoidance codes for DNA sequences
Abstract
In a DNA sequence, we have the celebrated Watson-Crick complement $\overline{T}=A$, $\overline{A}=T$, $\overline{C}=G$, and $\overline{G}=C$. Given an integer $m\ge 2$, a secondary structure in a DNA sequence refers to the existence of two non-overlapping reverse complement consecutive subsequences of length $m$, denoted as $\boldsymbol{x}=(x_1, \dots, x_m)$ and $\boldsymbol{y}=(y_1, \dots, y_m)$, such that $x_i=\overline{y_{m-i+1}}$ for $1\leq i \leq m$. The property of secondary structure avoidance (SSA) forbids a sequence to contain such reverse complement subsequences, and it is a key criterion in the design of single-stranded DNA sequences for DNA computing and storage. In this paper, we improve on a recent result of Nguyen et al., by introducing explicit constructions of secondary structure avoidance codes and analyzing the capacity for any given $m$. In particular, our constructions have optimal rate 1.1679bits/nt and 1.5515bits/nt when $m=2$ and $m=3$, respectively.
- Publication:
-
arXiv e-prints
- Pub Date:
- April 2023
- DOI:
- arXiv:
- arXiv:2304.11403
- Bibcode:
- 2023arXiv230411403C
- Keywords:
-
- Computer Science - Information Theory
- E-Print:
- Submitted to ISTC'23 (International Symposium on Topics in Coding)