Proteus: A Self-Designing Range Filter
Abstract
We introduce Proteus, a novel self-designing approximate range filter, which configures itself based on sampled data in order to optimize its false positive rate (FPR) for a given space requirement. Proteus unifies the probabilistic and deterministic design spaces of state-of-the-art range filters to achieve robust performance across a larger variety of use cases. At the core of Proteus lies our Contextual Prefix FPR (CPFPR) model - a formal framework for the FPR of prefix-based filters across their design spaces. We empirically demonstrate the accuracy of our model and Proteus' ability to optimize over both synthetic workloads and real-world datasets. We further evaluate Proteus in RocksDB and show that it is able to improve end-to-end performance by as much as 5.3x over more brittle state-of-the-art methods such as SuRF and Rosetta. Our experiments also indicate that the cost of modeling is not significant compared to the end-to-end performance gains and that Proteus is robust to workload shifts.
- Publication:
-
arXiv e-prints
- Pub Date:
- June 2022
- DOI:
- arXiv:
- arXiv:2207.01503
- Bibcode:
- 2022arXiv220701503K
- Keywords:
-
- Computer Science - Databases;
- Computer Science - Data Structures and Algorithms;
- Computer Science - Machine Learning;
- F.2.m;
- H.3.3
- E-Print:
- 14 pages, 9 figures, originally published in the Proceedings of the 2022 International Conference on Management of Data (SIGMOD'22), 9781450392495