Identity testing under label mismatch
Abstract
Testing whether the observed data conforms to a purported model (probability distribution) is a basic and fundamental statistical task, and one that is by now well understood. However, the standard formulation, identity testing, fails to capture many settings of interest; in this work, we focus on one such natural setting, identity testing under promise of permutation. In this setting, the unknown distribution is assumed to be equal to the purported one, up to a relabeling (permutation) of the model: however, due to a systematic error in the reporting of the data, this relabeling may not be the identity. The goal is then to test identity under this assumption: equivalently, whether this systematic labeling error led to a data distribution statistically far from the reference model.
- Publication:
-
arXiv e-prints
- Pub Date:
- May 2021
- DOI:
- arXiv:
- arXiv:2105.01856
- Bibcode:
- 2021arXiv210501856C
- Keywords:
-
- Mathematics - Statistics Theory;
- Computer Science - Discrete Mathematics;
- Computer Science - Data Structures and Algorithms