Carbonyl4: A Sketch for Set-Increment Mixed Updates
Abstract
In the realm of data stream processing, the advent of SET-INCREMENT Mixed (SIM) data streams necessitates algorithms that efficiently handle both SET and INCREMENT operations. We present Carbonyl4, an innovative algorithm designed specifically for SIM data streams, ensuring accuracy, unbiasedness, and adaptability. Carbonyl4 introduces two pioneering techniques: the Balance Bucket for refined variance optimization, and the Cascading Overflow for maintaining precision amidst overflow scenarios. Our experiments across four diverse datasets establish Carbonyl4's supremacy over existing algorithms, particularly in terms of accuracy for item-level information retrieval and adaptability to fluctuating memory requirements. The versatility of Carbonyl4 is further demonstrated through its dynamic memory shrinking capability, achieved via a re-sampling and a heuristic approach. The source codes of Carbonyl4 are available at GitHub.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2024
- DOI:
- arXiv:
- arXiv:2412.16566
- Bibcode:
- 2024arXiv241216566Z
- Keywords:
-
- Computer Science - Data Structures and Algorithms