Near-Optimal Coresets of Kernel Density Estimates
Abstract
We construct near-optimal coresets for kernel density estimates for points in $\mathbb{R}^d$ when the kernel is positive definite. Specifically we show a polynomial time construction for a coreset of size $O(\sqrt{d}/\varepsilon\cdot \sqrt{\log 1/\varepsilon} )$, and we show a near-matching lower bound of size $\Omega(\min\{\sqrt{d}/\varepsilon, 1/\varepsilon^2\})$. When $d\geq 1/\varepsilon^2$, it is known that the size of coreset can be $O(1/\varepsilon^2)$. The upper bound is a polynomial-in-$(1/\varepsilon)$ improvement when $d \in [3,1/\varepsilon^2)$ and the lower bound is the first known lower bound to depend on $d$ for this problem. Moreover, the upper bound restriction that the kernel is positive definite is significant in that it applies to a wide-variety of kernels, specifically those most important for machine learning. This includes kernels for information distances and the sinc kernel which can be negative.
- Publication:
-
arXiv e-prints
- Pub Date:
- February 2018
- DOI:
- 10.48550/arXiv.1802.01751
- arXiv:
- arXiv:1802.01751
- Bibcode:
- 2018arXiv180201751P
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Computational Geometry;
- Statistics - Machine Learning
- E-Print:
- This paper is combined with arXiv:1710.04325