Efficient Data Shapley for Weighted Nearest Neighbor Algorithms
Abstract
This work aims to address an open problem in data valuation literature concerning the efficient computation of Data Shapley for weighted $K$ nearest neighbor algorithm (WKNN-Shapley). By considering the accuracy of hard-label KNN with discretized weights as the utility function, we reframe the computation of WKNN-Shapley into a counting problem and introduce a quadratic-time algorithm, presenting a notable improvement from $O(N^K)$, the best result from existing literature. We develop a deterministic approximation algorithm that further improves computational efficiency while maintaining the key fairness properties of the Shapley value. Through extensive experiments, we demonstrate WKNN-Shapley's computational efficiency and its superior performance in discerning data quality compared to its unweighted counterpart.
- Publication:
-
arXiv e-prints
- Pub Date:
- January 2024
- DOI:
- arXiv:
- arXiv:2401.11103
- Bibcode:
- 2024arXiv240111103W
- Keywords:
-
- Computer Science - Data Structures and Algorithms;
- Computer Science - Machine Learning;
- Statistics - Machine Learning
- E-Print:
- AISTATS 2024 Oral