Deterministic Indexing for Packed Strings
Abstract
Given a string $S$ of length $n$, the classic string indexing problem is to preprocess $S$ into a compact data structure that supports efficient subsequent pattern queries. In the \emph{deterministic} variant the goal is to solve the string indexing problem without any randomization (at preprocessing time or query time). In the \emph{packed} variant the strings are stored with several character in a single word, giving us the opportunity to read multiple characters simultaneously. Our main result is a new string index in the deterministic \emph{and} packed setting. Given a packed string $S$ of length $n$ over an alphabet $\sigma$, we show how to preprocess $S$ in $O(n)$ (deterministic) time and space $O(n)$ such that given a packed pattern string of length $m$ we can support queries in (deterministic) time $O\left(m/\alpha + \log m + \log \log \sigma\right), $ where $\alpha = w / \log \sigma$ is the number of characters packed in a word of size $w = \Theta(\log n)$. Our query time is always at least as good as the previous best known bounds and whenever several characters are packed in a word, i.e., $\log \sigma \ll w$, the query times are faster.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2016
- DOI:
- 10.48550/arXiv.1612.01748
- arXiv:
- arXiv:1612.01748
- Bibcode:
- 2016arXiv161201748B
- Keywords:
-
- Computer Science - Data Structures and Algorithms;
- E.1;
- F.2.2;
- E.4