Performance of machine learning algorithms over high dimensional regression in quantifying water quality parameters from high frequency spectrometry sensors
Abstract
The recent advances in high frequency water quality data measurement has aided in quantifying diel and event dynamics, real-time monitoring of watersheds. The inherent data gap in water quality owing to infrequent discrete sampling in relation with continuously measured flow data contributes to the knowledge gap in understanding of biogeochemical processes. Spectrometry sensors can measure water quality parameters through their ability of measuring light absorbance instantaneously in the field or in the lab. We used high-frequency UV-Vis spectrometry sensors to obtain nitrate (NO3 -N), dissolved organic carbon (DOC) concentrations at 30 minute interval over two years in agricultural, urban and forested watersheds. High dimensional regression technique like partial least squares regression, lasso regression, and stepwise regression have been used in predicting variables from UV-Vis spectrometer measurements. Regression tools are basically generalization of relationships between co-variates and the dependent variable through a set of linear relations limited by assumption of homoscedasticity and independence of observations. However, Machine Learning algorithms like support vector machines, random forest can learn from given data without such constraint and model complex nonlinearities. The choice of the best method among these are largely data driven and may vary upon the choice of goodness-of-fit parameters. This study aims at analyzing reliability of various methods in quantifying multiple water quality parameters from the spectrometer measurements.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2019
- Bibcode:
- 2019AGUFM.H43Q2315S
- Keywords:
-
- 1848 Monitoring networks;
- HYDROLOGY;
- 1871 Surface water quality;
- HYDROLOGY;
- 1879 Watershed;
- HYDROLOGY;
- 1880 Water management;
- HYDROLOGY