A Unified Framework for Clustering Constrained Data without Locality Property
Abstract
In this paper, we consider a class of constrained clustering problems of points in $\mathbb{R}^{d}$, where $d$ could be rather high. A common feature of these problems is that their optimal clusterings no longer have the locality property (due to the additional constraints), which is a key property required by many algorithms for their unconstrained counterparts. To overcome the difficulty caused by the loss of locality, we present in this paper a unified framework, called {\em Peeling-and-Enclosing (PnE)}, to iteratively solve two variants of the constrained clustering problems, {\em constrained $k$-means clustering} ($k$-CMeans) and {\em constrained $k$-median clustering} ($k$-CMedian). Our framework is based on two standalone geometric techniques, called {\em Simplex Lemma} and {\em Weaker Simplex Lemma}, for $k$-CMeans and $k$-CMedian, respectively. The simplex lemma (or weaker simplex lemma) enables us to efficiently approximate the mean (or median) point of an unknown set of points by searching a small-size grid, independent of the dimensionality of the space, in a simplex (or the surrounding region of a simplex), and thus can be used to handle high dimensional data. If $k$ and $\frac{1}{\epsilon}$ are fixed numbers, our framework generates, in nearly linear time ({\em i.e.,} $O(n(\log n)^{k+1}d)$), $O((\log n)^{k})$ $k$-tuple candidates for the $k$ mean or median points, and one of them induces a $(1+\epsilon)$-approximation for $k$-CMeans or $k$-CMedian, where $n$ is the number of points. Combining this unified framework with a problem-specific selection algorithm (which determines the best $k$-tuple candidate), we obtain a $(1+\epsilon)$-approximation for each of the constrained clustering problems. We expect that our technique will be applicable to other constrained clustering problems without locality.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2018
- DOI:
- 10.48550/arXiv.1810.01049
- arXiv:
- arXiv:1810.01049
- Bibcode:
- 2018arXiv181001049D
- Keywords:
-
- Computer Science - Computational Geometry;
- Computer Science - Machine Learning