A data-driven approach to generate past GRACE-like terrestrial water storage solution by calibrating the land surface model simulations
The Gravity Recovery and Climate Experiment (GRACE) satellites provide unprecedented perspectives to hydrologists and geoscientists for observing and understanding the variation of terrestrial water storage (TWS) at continental to global scales. However, there are few reliable datasets of past TWS variations before GRACE observations were available (pre-2002). To fill this gap, we attempt to develop an approach to calibrate TWS anomalies (TWSA) data of past decades based on available GRACE solution and land surface model simulations, and a case study was conducted at the Nile River basin. Two ensemble learning algorithms, the Random Forest (RF) and the eXtreme Gradient Boost (XGB), combined with a spatially moving window structure, are used to build the reconstruction model, respectively. Reconstructed TWSA are validated against a precipitation-evapotranspiration index as well as other GRACE-based reconstructed TWSA datasets. Results show that the XGB model performs slightly better than the RF model in reconstructing GRACE TWSA data. The TWSA produced by the two ensemble learning algorithms are comparable and better than other examined reconstructed GRACE-like datasets, and are well correlation with original GRACE solution and past precipitation-evapotranspiration series. The profile soil moisture and groundwater storage show significant contributions to the RF and XGB model, but their variable importance values present different spatial patterns in the RF and XGB model. Further experiments are expected to investigate the contribution of human-induced factors to simulate terrestrial water storage dynamics, especially in intensely managed basins. Rather than modifying the structure and inputs of land surface models, this study provides an alternative way of improving the TWSA estimations of global land surface models and extending time range of GRACE datasets. The experiments are expected to promote and enrich the integration of physical and machine-learning models for optimal simulationsin geoscience research.