Causal discovery in the geosciences-Using synthetic data to learn how to interpret results
Abstract
Causal discovery algorithms based on probabilistic graphical models have recently emerged in geoscience applications for the identification and visualization of dynamical processes. The key idea is to learn the structure of a graphical model from observed spatio-temporal data, thus finding pathways of interactions in the observed physical system. Studying those pathways allows geoscientists to learn subtle details about the underlying dynamical mechanisms governing our planet. Initial studies using this approach on real-world atmospheric data have shown great potential for scientific discovery. However, in these initial studies no ground truth was available, so that the resulting graphs have been evaluated only by whether a domain expert thinks they seemed physically plausible. The lack of ground truth is a typical problem when using causal discovery in the geosciences. Furthermore, while most of the connections found by this method match domain knowledge, we encountered one type of connection for which no explanation was found. To address both of these issues we developed a simulation framework that generates synthetic data of typical atmospheric processes (advection and diffusion). Applying the causal discovery algorithm to the synthetic data allowed us (1) to develop a better understanding of how these physical processes appear in the resulting connectivity graphs, and thus how to better interpret such connectivity graphs when obtained from real-world data; (2) to solve the mystery of the previously unexplained connections.
- Publication:
-
Computers and Geosciences
- Pub Date:
- February 2017
- DOI:
- 10.1016/j.cageo.2016.10.008
- Bibcode:
- 2017CG.....99...50E
- Keywords:
-
- Probabilistic graphical model;
- Structure learning;
- Causal discovery;
- Information flow;
- Geoscience;
- Climate;
- Atmospheric science