Straggler Identification in RoundTrip Data Streams via Newton's Identities and Invertible Bloom Filters
Abstract
We introduce the straggler identification problem, in which an algorithm must determine the identities of the remaining members of a set after it has had a large number of insertion and deletion operations performed on it, and now has relatively few remaining members. The goal is to do this in o(n) space, where n is the total number of identities. The straggler identification problem has applications, for example, in determining the set of unacknowledged packets in a highbandwidth multicast data stream. We provide a deterministic solution to the straggler identification problem that uses only O(d log n) bits and is based on a novel application of Newton's identities for symmetric polynomials. This solution can identify any subset of d stragglers from a set of n O(log n)bit identifiers, assuming that there are no false deletions of identities not already in the set. Indeed, we give a lower bound argument that shows that any smallspace deterministic solution to the straggler identification problem cannot be guaranteed to handle false deletions. Nevertheless, we show that there is a simple randomized solution using O(d log n log(1/epsilon)) bits that can maintain a multiset and solve the straggler identification problem, tolerating false deletions, where epsilon>0 is a userdefined parameter bounding the probability of an incorrect response. This randomized solution is based on a new type of Bloom filter, which we call the invertible Bloom filter.
 Publication:

arXiv eprints
 Pub Date:
 April 2007
 arXiv:
 arXiv:0704.3313
 Bibcode:
 2007arXiv0704.3313E
 Keywords:

 Computer Science  Data Structures and Algorithms;
 F.2.2
 EPrint:
 Fuller version of paper appearing in 10th Worksh. Algorithms and Data Structures, Halifax, Nova Scotia, 2007