Consistent with knowledge or consistent with observation? Testing the physical realism of machine learning models in hydrology

Consistent with knowledge or consistent with observation? Testing the physical realism of machine learning models in hydrology

Machine learning models have been used in various fields of hydrology to model the statistical correlations between random variables that describe the states of environmental systems. However, these models are often derived from observation data with little or no regularizations based on physical principles. Thus, a models ability to provide physically plausible predictions is not guaranteed. In practice, a models quality is often accessed by its prediction accuracy, which measures its ability to reproduce the observation data of a test dataset. That is, the prediction accuracy measurement is dependent on the data distribution of the test set. However, this can be problematic as the models are often adopted to predict future or unknown events, whose underlying data distributions may be different from the observation data used in testing, which undermines the validity of the prediction accuracy measurement. To address this problem, this study adopts metamorphic testing, a method from software engineering, to assess whether a models responses to changes in the model input are consistent with domain-specific knowledge of the system being modeled. For instance, a rainfall-runoff model is expected to predict a larger runoff volume when the magnitude of precipitation increases. The proposed method is applied to test various models that are trained to predict the peak streamflow discharge of flood events in Germany. The results show that in many models often fail to capture the positive correlation between the precipitation magnitude and peak streamflow discharge when the inputs are different from the observation data. Explainable Artificial Intelligence (XAI) methods, such as SHAP, are also adopted to analyze the basis of the inconsistent predictions. A models prediction accuracy is found to be uncorrelated with its ability to provide physically plausible predictions. In conclusion, this study shows that it can be useful to test whether the mechanisms learned automatically by machine learning models can generate predictions that are consistent with domain-specific knowledge of the system being modeled.

Publication:: AGU Fall Meeting Abstracts
Pub Date:: December 2021
Bibcode:: 2021AGUFM.H33F..08Y

NASA/ADS