The Power of Uniform Sampling for Coresets
Abstract
Motivated by practical generalizations of the classic $k$-median and $k$-means objectives, such as clustering with size constraints, fair clustering, and Wasserstein barycenter, we introduce a meta-theorem for designing coresets for constrained-clustering problems. The meta-theorem reduces the task of coreset construction to one on a bounded number of ring instances with a much-relaxed additive error. This reduction enables us to construct coresets using uniform sampling, in contrast to the widely-used importance sampling, and consequently we can easily handle constrained objectives. Notably and perhaps surprisingly, this simpler sampling scheme can yield coresets whose size is independent of $n$, the number of input points. Our technique yields smaller coresets, and sometimes the first coresets, for a large number of constrained clustering problems, including capacitated clustering, fair clustering, Euclidean Wasserstein barycenter, clustering in minor-excluded graph, and polygon clustering under Fréchet and Hausdorff distance. Finally, our technique yields also smaller coresets for $1$-median in low-dimensional Euclidean spaces, specifically of size $\tilde{O}(\varepsilon^{-1.5})$ in $\mathbb{R}^2$ and $\tilde{O}(\varepsilon^{-1.6})$ in $\mathbb{R}^3$.
- Publication:
-
arXiv e-prints
- Pub Date:
- September 2022
- DOI:
- 10.48550/arXiv.2209.01901
- arXiv:
- arXiv:2209.01901
- Bibcode:
- 2022arXiv220901901B
- Keywords:
-
- Computer Science - Data Structures and Algorithms