Memory-Efficient Sequential Pattern Mining with Hybrid Tries
Abstract
This paper develops a memory-efficient approach for Sequential Pattern Mining (SPM), a fundamental topic in knowledge discovery that faces a well-known memory bottleneck for large data sets. Our methodology involves a novel hybrid trie data structure that exploits recurring patterns to compactly store the data set in memory; and a corresponding mining algorithm designed to effectively extract patterns from this compact representation. Numerical results on small to medium-sized real-life test instances show an average improvement of 85% in memory consumption and 49% in computation time compared to the state of the art. For large data sets, our algorithm stands out as the only capable SPM approach within 256GB of system memory, potentially saving 1.7TB in memory consumption.
- Publication:
-
arXiv e-prints
- Pub Date:
- February 2022
- DOI:
- 10.48550/arXiv.2202.06834
- arXiv:
- arXiv:2202.06834
- Bibcode:
- 2022arXiv220206834H
- Keywords:
-
- Computer Science - Databases;
- Computer Science - Artificial Intelligence;
- Computer Science - Data Structures and Algorithms;
- Computer Science - Machine Learning