Assessing the performance and robustness of two conceptual rainfall-runoff models on a worldwide sample of watersheds
To assess the predictive performance, robustness and generality of watershed-scale hydrological models, we conducted a detailed multi-objective evaluation of two conceptual rainfall-runoff models (the GRX model, based on the GR4J model, and the MRX model, based on the MORDOR model), of differing complexity (with respectively, 5 and 11 free parameters in the rainfall-runoff module, and 4 and 11 free parameters in the snow module). These models were compared on a large sample of 2050 watersheds worldwide. Our results, based on the three components of the Kling-Gupta Efficiency metric (KGE), indicate that both models provide (on average) similar levels of performance in evaluation when calibrated with KGE, for water balance (mean bias lower than 2%), time-series variability (mean variability bias lower than 2%) and temporal correlation (mean correlation around 0.83). Further, both models clearly suffer from lack of robustness when simulating water balance, with a significant increase of the proportion of biased simulations over the evaluation periods (absolute bias lower than 2% in calibration and lower than 20% in evaluation for 80% of the watersheds). Simulation performance depend more on the hydro-meteorological conditions of a given period than on the complexity of the model structure. We also show that long-term aggregate statistics (computed on the overall period) can fail to reveal considerable sub-period variability in model performance, thereby providing inaccurate diagnostic assessment of the predictive model performance. Typically the median absolute bias is lower than 8% in evaluation, but the median maximum bias can be as high as 50% within a subperiod, for both models, when calibrated with KGE.