Sequential anomaly identification with observation control under generalized error metrics
Abstract
The problem of sequential anomaly detection and identification is considered, where multiple data sources are simultaneously monitored and the goal is to identify in real time those, if any, that exhibit ``anomalous" statistical behavior. An upper bound is postulated on the number of data sources that can be sampled at each sampling instant, but the decision maker selects which ones to sample based on the already collected data. Thus, in this context, a policy consists not only of a stopping rule and a decision rule that determine when sampling should be terminated and which sources to identify as anomalous upon stopping, but also of a sampling rule that determines which sources to sample at each time instant subject to the sampling constraint. Two distinct formulations are considered, which require control of different, ``generalized" error metrics. The first one tolerates a certain user-specified number of errors, of any kind, whereas the second tolerates distinct, user-specified numbers of false positives and false negatives. For each of them, a universal asymptotic lower bound on the expected time for stopping is established as the error probabilities go to 0, and it is shown to be attained by a policy that combines the stopping and decision rules proposed in the full-sampling case with a probabilistic sampling rule that achieves a specific long-run sampling frequency for each source. Moreover, the optimal to a first order asymptotic approximation expected time for stopping is compared in simulation studies with the corresponding factor in a finite regime, and the impact of the sampling constraint and tolerance to errors is assessed.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2024
- DOI:
- 10.48550/arXiv.2412.04693
- arXiv:
- arXiv:2412.04693
- Bibcode:
- 2024arXiv241204693T
- Keywords:
-
- Mathematics - Statistics Theory
- E-Print:
- 57 pages, 4 figures, 1 table