Mitigating Prediction Error of Deep Learning Streamflow Models in Large Data-Sparse Regions With Ensemble Modeling and Soft Data
Abstract
Predicting discharge in contiguously data-scarce or ungauged regions is needed for quantifying the global hydrologic cycle. We show that prediction in ungauged regions (PUR) has major, underrecognized uncertainty and is drastically more difficult than previous problems where basins can be represented by neighboring or similar basins (known as prediction in ungauged basins). While deep neural networks demonstrated stellar performance for streamflow predictions, performance nonetheless declined for PUR, benchmarked here with a new stringent region-based holdout test on a US data set with 671 basins. We tested approaches to reduce such errors, leveraging deep network's flexibility to integrate "soft" data, such as satellite-based soil moisture product, or daily flow distributions which improved low flow simulations. A novel input-selection ensemble improved average performance and greatly reduced catastrophic failures. Despite challenges, deep networks showed stronger performance metrics for PUR than traditional hydrologic models. They appear competitive for geoscientific modeling even in data-scarce settings.
- Publication:
-
Geophysical Research Letters
- Pub Date:
- July 2021
- DOI:
- arXiv:
- arXiv:2011.13380
- Bibcode:
- 2021GeoRL..4892999F
- Keywords:
-
- LSTM;
- ungauged regions;
- deep learning;
- CAMELS;
- benchmark;
- data scarce;
- Computer Science - Machine Learning;
- Computer Science - Artificial Intelligence
- E-Print:
- Geophysical Research Letters, 2021