Provable Deterministic Leverage Score Sampling
Abstract
We explain theoretically a curious empirical phenomenon: "Approximating a matrix by deterministically selecting a subset of its columns with the corresponding largest leverage scores results in a good low-rank matrix surrogate". To obtain provable guarantees, previous work requires randomized sampling of the columns with probabilities proportional to their leverage scores. In this work, we provide a novel theoretical analysis of deterministic leverage score sampling. We show that such deterministic sampling can be provably as accurate as its randomized counterparts, if the leverage scores follow a moderately steep power-law decay. We support this power-law assumption by providing empirical evidence that such decay laws are abundant in real-world data sets. We then demonstrate empirically the performance of deterministic leverage score sampling, which many times matches or outperforms the state-of-the-art techniques.
- Publication:
-
arXiv e-prints
- Pub Date:
- April 2014
- DOI:
- 10.48550/arXiv.1404.1530
- arXiv:
- arXiv:1404.1530
- Bibcode:
- 2014arXiv1404.1530P
- Keywords:
-
- Computer Science - Data Structures and Algorithms;
- Computer Science - Information Theory;
- Computer Science - Numerical Analysis;
- Mathematics - Statistics Theory;
- Statistics - Machine Learning
- E-Print:
- 20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining