Correlation Clustering with Asymmetric Classification Errors
Abstract
In the Correlation Clustering problem, we are given a weighted graph $G$ with its edges labeled as "similar" or "dissimilar" by a binary classifier. The goal is to produce a clustering that minimizes the weight of "disagreements": the sum of the weights of "similar" edges across clusters and "dissimilar" edges within clusters. We study the correlation clustering problem under the following assumption: Every "similar" edge $e$ has weight $\mathbf{w}_e\in[\alpha \mathbf{w}, \mathbf{w}]$ and every "dissimilar" edge $e$ has weight $\mathbf{w}_e\geq \alpha \mathbf{w}$ (where $\alpha\leq 1$ and $\mathbf{w}>0$ is a scaling parameter). We give a $(3 + 2 \log_e (1/\alpha))$ approximation algorithm for this problem. This assumption captures well the scenario when classification errors are asymmetric. Additionally, we show an asymptotically matching Linear Programming integrality gap of $\Omega(\log 1/\alpha)$.
 Publication:

arXiv eprints
 Pub Date:
 August 2021
 arXiv:
 arXiv:2108.05696
 Bibcode:
 2021arXiv210805696J
 Keywords:

 Computer Science  Data Structures and Algorithms;
 Computer Science  Machine Learning
 EPrint:
 24 pages, 2 figures. The conference version of this paper appeared in the proceedings of ICML 2020