Bounds and Constructions of $\ell$-Read Codes under the Hamming Metric
Abstract
Nanopore sequencing is a promising technology for DNA sequencing. In this paper, we investigate a specific model of the nanopore sequencer, which takes a $q$-ary sequence of length $n$ as input and outputs a vector of length $n+\ell-1$ referred to as an $\ell$-read vector where the $i$-th entry is a multi-set composed of the $\ell$ elements located between the $(i-\ell+1)$-th and $i$-th positions of the input sequence. Considering the presence of substitution errors in the output vector, we study $\ell$-read codes under the Hamming metric. An $\ell$-read $(n,d)_q$-code is a set of $q$-ary sequences of length $n$ in which the Hamming distance between $\ell$-read vectors of any two distinct sequences is at least $d$. We first improve the result of Banerjee \emph{et al.}, who studied $\ell$-read $(n,d)_q$-codes with the constraint $\ell\geq 3$ and $d=3$. Then, we investigate the bounds and constructions of $2$-read codes with a minimum distance of $3$, $4$, and $5$, respectively. Our results indicate that when $d \in \{3,4\}$, the optimal redundancy of $2$-read $(n,d)_q$-codes is $o(\log_q n)$, while for $d=5$ it is $\log_q n+o(\log_q n)$. Additionally, we establish an equivalence between $2$-read $(n,3)_q$-codes and classical $q$-ary single-insertion reconstruction codes using two noisy reads. We improve the lower bound on the redundancy of classical $q$-ary single-insertion reconstruction codes as well as the upper bound on the redundancy of classical $q$-ary single-deletion reconstruction codes when using two noisy reads. Finally, we study $\ell$-read codes under the reconstruction model.
- Publication:
-
arXiv e-prints
- Pub Date:
- March 2024
- DOI:
- 10.48550/arXiv.2403.11754
- arXiv:
- arXiv:2403.11754
- Bibcode:
- 2024arXiv240311754S
- Keywords:
-
- Computer Science - Information Theory