Separations for Estimating Large Frequency Moments on Data Streams

doi:10.48550/arXiv.2105.03773

Separations for Estimating Large Frequency Moments on Data Streams

We study the classical problem of moment estimation of an underlying vector whose $n$ coordinates are implicitly defined through a series of updates in a data stream. We show that if the updates to the vector arrive in the random-order insertion-only model, then there exist space efficient algorithms with improved dependencies on the approximation parameter $\varepsilon$. In particular, for any real $p > 2$, we first obtain an algorithm for $F_p$ moment estimation using $\tilde{\mathcal{O}}\left(\frac{1}{\varepsilon^{4/p}}\cdot n^{1-2/p}\right)$ bits of memory. Our techniques also give algorithms for $F_p$ moment estimation with $p>2$ on arbitrary order insertion-only and turnstile streams, using $\tilde{\mathcal{O}}\left(\frac{1}{\varepsilon^{4/p}}\cdot n^{1-2/p}\right)$ bits of space and two passes, which is the first optimal multi-pass $F_p$ estimation algorithm up to $\log n$ factors. Finally, we give an improved lower bound of $\Omega\left(\frac{1}{\varepsilon^2}\cdot n^{1-2/p}\right)$ for one-pass insertion-only streams. Our results separate the complexity of this problem both between random and non-random orders, as well as one-pass and multi-pass streams.

Publication:

arXiv e-prints

Pub Date:

May 2021

DOI:

10.48550/arXiv.2105.03773

arXiv:

arXiv:2105.03773

Bibcode:

2021arXiv210503773W

Keywords:

Computer Science - Data Structures and Algorithms

E-Print:

ICALP 2021

ADS

Separations for Estimating Large Frequency Moments on Data Streams

Abstract