Stream Clipper: Scalable Submodular Maximization on Stream
Abstract
We propose a streaming submodular maximization algorithm "stream clipper" that performs as well as the offline greedy algorithm on document/video summarization in practice. It adds elements from a stream either to a solution set $S$ or to an extra buffer $B$ based on two adaptive thresholds, and improves $S$ by a final greedy step that starts from $S$ adding elements from $B$. During this process, swapping elements out of $S$ can occur if doing so yields improvements. The thresholds adapt based on if current memory utilization exceeds a budget, e.g., it increases the lower threshold, and removes from the buffer $B$ elements below the new lower threshold. We show that, while our approximation factor in the worst case is $1/2$ (like in previous work, and corresponding to the tight bound), we show that there are datadependent conditions where our bound falls within the range $[1/2, 11/e]$. In news and video summarization experiments, the algorithm consistently outperforms other streaming methods, and, while using significantly less computation and memory, performs similarly to the offline greedy algorithm.
 Publication:

arXiv eprints
 Pub Date:
 June 2016
 arXiv:
 arXiv:1606.00389
 Bibcode:
 2016arXiv160600389Z
 Keywords:

 Statistics  Machine Learning;
 Computer Science  Machine Learning;
 Mathematics  Combinatorics
 EPrint:
 17 pages, 12 figures, submitted to conference