Adaptive Encoding Strategies for Erasing-Based Lossless Floating-Point Compression

doi:10.48550/arXiv.2308.11915

Adaptive Encoding Strategies for Erasing-Based Lossless Floating-Point Compression

Lossless floating-point time series compression is crucial for a wide range of critical scenarios. Nevertheless, it is a big challenge to compress time series losslessly due to the complex underlying layouts of floating-point values. The state-of-the-art erasing-based compression algorithm Elf demonstrates a rather impressive performance. We give an in-depth exploration of the encoding strategies of Elf, and find that there is still much room for improvement. In this paper, we propose Elf*, which employs a set of optimizations for leading zeros, center bits and sharing condition. Specifically, we develop a dynamic programming algorithm with a set of pruning strategies to compute the adaptive approximation rules efficiently. We theoretically prove that the adaptive approximation rules are globally optimal. We further extend Elf* to Streaming Elf*, i.e., SElf*, which achieves almost the same compression ratio as Elf*, while enjoying even higher efficiency in streaming scenarios. We compare Elf* and SElf* with 8 competitors using 22 datasets. The results demonstrate that SElf* achieves 9.2% relative compression ratio improvement over the best streaming competitor while maintaining similar efficiency, and that Elf* ranks among the most competitive batch compressors. All source codes are publicly released.

Publication:

arXiv e-prints

Pub Date:

August 2023

DOI:

10.48550/arXiv.2308.11915

arXiv:

arXiv:2308.11915

Bibcode:

2023arXiv230811915L

Keywords:

Computer Science - Data Structures and Algorithms

NASA/ADS

Adaptive Encoding Strategies for Erasing-Based Lossless Floating-Point Compression

Abstract