On Coresets for Clustering in Small Dimensional Euclidean Spaces
Abstract
We consider the problem of constructing small coresets for $k$Median in Euclidean spaces. Given a large set of data points $P\subset \mathbb{R}^d$, a coreset is a much smaller set $S\subset \mathbb{R}^d$, so that the $k$Median costs of any $k$ centers w.r.t. $P$ and $S$ are close. Existing literature mainly focuses on the highdimension case and there has been great success in obtaining dimensionindependent bounds, whereas the case for small $d$ is largely unexplored. Considering many applications of Euclidean clustering algorithms are in small dimensions and the lack of systematic studies in the current literature, this paper investigates coresets for $k$Median in small dimensions. For small $d$, a natural question is whether existing nearoptimal dimensionindependent bounds can be significantly improved. We provide affirmative answers to this question for a range of parameters. Moreover, new lower bound results are also proved, which are the highest for small $d$. In particular, we completely settle the coreset size bound for $1$d $k$Median (up to log factors). Interestingly, our results imply a strong separation between $1$d $1$Median and $1$d $2$Median. As far as we know, this is the first such separation between $k=1$ and $k=2$ in any dimension.
 Publication:

arXiv eprints
 Pub Date:
 February 2023
 DOI:
 10.48550/arXiv.2302.13737
 arXiv:
 arXiv:2302.13737
 Bibcode:
 2023arXiv230213737H
 Keywords:

 Computer Science  Data Structures and Algorithms