A New Coreset Framework for Clustering
Abstract
Given a metric space, the $(k,z)$-clustering problem consists of finding $k$ centers such that the sum of the of distances raised to the power $z$ of every point to its closest center is minimized. This encapsulates the famous $k$-median ($z=1$) and $k$-means ($z=2$) clustering problems. Designing small-space sketches of the data that approximately preserves the cost of the solutions, also known as \emph{coresets}, has been an important research direction over the last 15 years. In this paper, we present a new, simple coreset framework that simultaneously improves upon the best known bounds for a large variety of settings, ranging from Euclidean space, doubling metric, minor-free metric, and the general metric cases.
- Publication:
-
arXiv e-prints
- Pub Date:
- April 2021
- DOI:
- 10.48550/arXiv.2104.06133
- arXiv:
- arXiv:2104.06133
- Bibcode:
- 2021arXiv210406133C
- Keywords:
-
- Computer Science - Data Structures and Algorithms
- E-Print:
- Improved presentation. Adds a simpler suboptimal proof for interesting points, and an improved analysis for planar graphs. Corrects errors in the construction of centroid sets