Explainable transfer learning for generalizable subgrid-scale modeling
Abstract
Recent studies have found promising results using machine learning (ML) techniques such as deep neural networks (NNs) for improving climate models; e.g., by better representing subgrid-scale (SGS) processes. However, applying NN-enhanced climate models to a different climate system, for example, one with increased radiative forcing, can lead to inaccurate and even unstable simulations. This is because NNs and similar techniques do not generalize out of distribution, i.e., they do not extrapolate. Transfer learning (TL), which involves targeted re-training of some of the layer(s) with a small amount of data from the new system, offers a practical solution to this problem, as shown in several recent studies focused on simple systems. However, the general understanding of TL, mainly from applications involving static images, does not apply to the multi-scale, nonlinear, spatio-temporal data of the climate system. Effective TL for climate applications requires knowing 1) how to optimally re-train NNs? and 2) what physics are learned during TL and how? Here, we present novel analyses and a theoretical framework addressing (1)-(2) for problems such as SGS modeling. Our approach combines spectral analyses of multi-scale nonlinear dynamical systems with spectral analyses of convolutional NNs (CNNs), revealing physical connections between the systems' dynamics and the inner workings of the NNs. Integrating these analyses, we introduce a general framework that identifies the optimal re-training procedure for a given problem based on physics and NN theory (figure below). As a test case, we explain the physics of TL in SGS modeling of eddy momentum and heat fluxes using CNNs in several setups of 2D turbulence, two-layer quasi-geostrophic turbulence, and Rayleigh-Benard convection. These systems represent a broad range of complexities in terms of physics, spatial scales, inhomogeneity, and anisotropy. Our analyses show that in these cases, often the shallowest convolution layers are the best to re-train, which is consistent with our physics-guided framework but is against the common wisdom guiding TL in the ML literature. Our analyses further demonstrate that the convolution kernels learned during the original training and re-training processes of CNNs for SGS modeling are a combination of low- and high-pass spectral filters along with Gabor filters. This work provides a new avenue for optimal and explainable TL, and a step toward fully explainable NNs, for a wide range of applications in climate science, particularly for SGS modeling.
Part of the results are presented in https://arxiv.org/abs/2206.03198- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2022
- Bibcode:
- 2022AGUFMNG16A..06H