Cloud Computing Tools for Developing LASSO (L1) Regularized Remote Sensing Models for Optically Complex Waters
Abstract
Remote sensing data are used extensively to monitor water quality parameters such as clarity, temperature, and chlorophyll-a (chl-a). Many studies develop empirical water quality models using in situ data collected coincident with satellite data collections and with approaches such as multi-linear regression and step-wise linear regression. However, these approaches, which require modelers to select the parameters used in the model, may not be well-suited for optically complex waters, where interference from substances such as suspended solids and dissolved organic matter complicate remote sensing of the spectral signal of interest. Recent work has demonstrated the use of machine learning approaches, which explore large feature spaces and do not require parameter selection. However, these methods have the potential risk of overfitting and, because of the large number of features, result in models that are not explainable. This excludes their application from conditions that are out of sample. We explore the use of LASSO (Least Absolute Shrinkage and Select Operator), more commonly known as L1, regularization in improving model performance and producing models parsimonious enough to allow interpretation and explainability. We demonstrate this approach with case study on chl-a in Utah Lake, Utah, USA., an optically complex water body, and compare model terms to those in the literature. We extend recent literature by using non-coincident data for model creation. We investigate the effect of satellite image proximity in time to in situ sampling on model performance. We discuss trade-offs between interpretability and model performance using the L1 regularization as a tool. For our selected model, with 5 parameters, the root mean square error (RMSE) was 28 µg/L, lower than reported literature values for Utah Lake. The model terms are both similar to and distinct from the literature, suggesting this approach is useful for model development for optically complex water bodies where standard model terms may not be optimal. We provide Google Earth Engine (GEE) cloud computing tools for compiling near-coincident data pairs and implementing L1 model creation that can be used by users with in situ data for other water bodies and water quality parameters.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2022
- Bibcode:
- 2022AGUFMIN16A..01C