Automated Machine Learning Approach to Supervised Anomaly Detection from Critical Zone Watershed Sensor-Generated Time Series Data
Abstract
Critical zone time series datasets frequently contain anomalous patterns. Anomalous patterns occurring in hydrological time series data from a watershed may indicate some unusual hydrological events that can prove useful to make right decisions on planning, operating, and managing water resources. In the CZNet Big Data project, our focus is on a specific type of pattern anomalies, called peak anomalies. From the watershed data used in our research, we have identified such peak anomaly types as "skyrocketing peaks", "plummeting peaks", "flat plateaus", "flat sinks", and "phantom peaks". Our work uses deep learning classifiers for anomaly detection. State of the art deep learning classifiers are typically large and complex, consuming large memory space and long time for training the model, and are either suboptimal or an overkill. Thus, we use an automated machine learning (autoML) framework to find an optimal model. Specifically, we utilize neural architecture search and meta-learning techniques to build an optimal neural anomaly classifier at runtime tailored to different use cases. The classifiers we use are supervised algorithms, which need labeled anomaly instances used for training, validation, and testing. Unfortunately, such labels are hardly available in raw time series data and are typically difficult to generate. Manual labeling is time-consuming, labor-laden, and error-prone. To deal with this problem, we generate synthetic time series data from the original unlabeled data using a generative adversarial network and inject pre-identified peak anomalies at random locations in the synthetic data. This synthetic labeled data is then used to train the autoML classifier, which is then tested on real time series data.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2022
- Bibcode:
- 2022AGUFM.H22P1031H