Partial Wasserstein and Maximum Mean Discrepancy distances for bridging the gap between outlier detection and drift detection
Abstract
With the rise of machine learning and deep learning based applications in practice, monitoring, i.e. verifying that these operate within specification, has become an important practical problem. An important aspect of this monitoring is to check whether the inputs (or intermediates) have strayed from the distribution they were validated for, which can void the performance assurances obtained during testing. There are two common approaches for this. The, perhaps, more classical one is outlier detection or novelty detection, where, for a single input we ask whether it is an outlier, i.e. exceedingly unlikely to have originated from a reference distribution. The second, perhaps more recent approach, is to consider a larger number of inputs and compare its distribution to a reference distribution (e.g. sampled during testing). This is done under the label drift detection. In this work, we bridge the gap between outlier detection and drift detection through comparing a given number of inputs to an automatically chosen part of the reference distribution.
- Publication:
-
arXiv e-prints
- Pub Date:
- June 2021
- DOI:
- arXiv:
- arXiv:2106.12893
- Bibcode:
- 2021arXiv210612893V
- Keywords:
-
- Computer Science - Machine Learning;
- Statistics - Machine Learning