PyHydroQC: A Python Package for Automating and Streamlining Aquatic Sensor Data Post Processing
Abstract
Sensors deployed to aquatic environments that make measurements at high frequency commonly include artifacts that do not represent the targeted environmental phenomena. Sensors are subject to fouling from environmental conditions, often exhibit drift and calibration shifts, and report anomalies and erroneous readings due to issues with datalogging, transmission, and other causes. The suitability of data for analyses and decision making often depend on subjective and tedious quality control processes consisting of manual review, adjustment, flagging, and removal of data. Data driven and machine learning techniques have the potential to automate identification and correction of anomalous data, streamlining the quality control process. We explored and implemented approaches for automating anomaly detection and correction of aquatic sensor data in a Python package (PyHydroQC). Functions in the package apply time series regression approaches that estimate values, identify anomalies based on dynamic thresholds, and offer replacement values for correcting anomalies. Techniques were developed and tested based on several years of data from aquatic sensors deployed at multiple sites in the Logan River Observatory in northern Utah, USA. Performance was assessed based on labels and corrections applied by trained technicians. Here we describe the PyHydroQC workflow for quality control and present results from the use case. Auto-Regressive Integrated Moving Average (ARIMA) consistently performed best, and aggregating results from multiple models improved detection. The correction algorithm was successful at approximating patterns outside of the scope of technician abilities. Though technician review may still be required, the quantity of data in question is significantly reduced and using a reproducible workflow improves consistency in quality control.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2021
- Bibcode:
- 2021AGUFM.H21B..07J