On Prefix Normal Words and Prefix Normal Forms
Abstract
A $1$prefix normal word is a binary word with the property that no factor has more $1$s than the prefix of the same length; a $0$prefix normal word is defined analogously. These words arise in the context of indexed binary jumbled pattern matching, where the aim is to decide whether a word has a factor with a given number of $1$s and $0$s (a given Parikh vector). Each binary word has an associated set of Parikh vectors of the factors of the word. Using prefix normal words, we provide a characterization of the equivalence class of binary words having the same set of Parikh vectors of their factors. We prove that the language of prefix normal words is not contextfree and is strictly contained in the language of prenecklaces, which are prefixes of powers of Lyndon words. We give enumeration results on $\textit{pnw}(n)$, the number of prefix normal words of length $n$, showing that, for sufficiently large $n$, \[ 2^{n4 \sqrt{n \lg n}} \le \textit{pnw}(n) \le 2^{n  \lg n + 1}. \] For fixed density (number of $1$s), we show that the ordinary generating function of the number of prefix normal words of length $n$ and density $d$ is a rational function. Finally, we give experimental results on $\textit{pnw}(n)$, discuss further properties, and state open problems.
 Publication:

arXiv eprints
 Pub Date:
 November 2016
 arXiv:
 arXiv:1611.09017
 Bibcode:
 2016arXiv161109017B
 Keywords:

 Computer Science  Discrete Mathematics;
 Computer Science  Formal Languages and Automata Theory;
 Mathematics  Combinatorics
 EPrint:
 To appear in Theoretical Computer Science