Using Machine Learning to Automate Trend Model Evaluations for Large-scale, Multi-site Water-quality Trend Assessments
Abstract
High-performance computing and multisource databases, like the Water Quality Portal (https://www.waterqualitydata.us/), are more accessible than ever and allow the estimation of thousands of site-specific water-quality trends. Trends computed as part of regional and national assessments provide researchers with the capacity to show how concentrations or fluxes of constituents such as salinity or nutrients have changed over time across large geographic areas and provide insight on progress, or lack thereof, towards attaining water quality goals. However, the sheer number of fitted trend models from these large-scale assessments result in a burdensome number of models that require manual evaluation of model fit and assumptions. Common solutions to this predicament are to rely on simpler non-parametric approaches (e.g., the Mann-Kendall and Sen slope) with fewer assumptions to evaluate, use a predefined set of metrics and thresholds selected by the analyst to decide which models to retain or reject, or not evaluate model assumptions at all. An alternative approach is to leverage previously completed manual evaluations and develop a supervised machine learning model that associates "accept/reject" decisions made by knowledgeable reviewers with quantitative metrics derived from trend model output. Representing a variety of water quality constituents, we used over 6,000 trend models fit using the Weighted Regressions on Time, Discharge, and Season model to calibrate three machine learning (ML) models: logistic regression, discriminant analysis, and k-nearest neighbors. These ML models predict if a trend model should be "accepted" or "rejected" based on manual review outcomes and explanatory variables derived from the observed, estimated, and residual values. Once calibrated, ML models were further evaluated by testing their performance on a set of trend models with manual evaluations from a different study (not used in the calibration of the ML models). By allowing analysts of other large-scale modeling studies to focus manual review only on models not confidently accepted or rejected through ML screening, this approach leverages hundreds of hours of human expertise providing an alternative and rapid solution for a time consuming though essential task.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2022
- Bibcode:
- 2022AGUFM.H33A..01M