MR-RePair: Grammar Compression based on Maximal Repeats
Abstract
We analyze the grammar generation algorithm of the RePair compression algorithm and show the relation between a grammar generated by RePair and maximal repeats. We reveal that RePair replaces step by step the most frequent pairs within the corresponding most frequent maximal repeats. Then, we design a novel variant of RePair, called MR-RePair, which substitutes the most frequent maximal repeats at once instead of substituting the most frequent pairs consecutively. We implemented MR-RePair and compared the size of the grammar generated by MR-RePair to that by RePair on several text corpus. Our experiments show that MR-RePair generates more compact grammars than RePair does, especially for highly repetitive texts.
- Publication:
-
arXiv e-prints
- Pub Date:
- November 2018
- DOI:
- 10.48550/arXiv.1811.04596
- arXiv:
- arXiv:1811.04596
- Bibcode:
- 2018arXiv181104596F
- Keywords:
-
- Computer Science - Data Structures and Algorithms