The Big Climate Data Pipeline (BCDP): An Open-Source Python Library to Analyze High-Resolution Climate Models and Satellite Observations in Amazon Cloud environments
Abstract
Evaluations of climate models with respect to observations remains an important task for many climate assessment reports such as the National Climate Assessment (NCA). Jet Propulsion Laboratory (JPL) has previously contributed to these assessments through the Regional Climate Model Evaluation System project (RCMES) which facilitated the development of an open source python library, the Apache Open Climate Workbench (OCW) in 2013. Utilizing python libraries from the pydata stack such as xarray and dask, we have since created a more modern replacement: the Big Climate Data Pipeline (BCDP), which provides a more flexible API for configuring climate model evaluations and inherently supports Big Data use cases. We have also demonstrated that for single-threaded use-cases with monthly regional climate simulations, BCDP provided significant performance benefits over OCW due to utilizing xarray's lazy-evaluation data model. While many evaluations involve very simple intercomparison metrics such as model bias and root mean square error (RMSE), BCDP also supports seamless integration of custom use cases by providing an extensible object-oriented API for each component of the data processing pipeline. We will demonstrate this capability by examining a use case which requires calculating an extreme precipitation index for high-resolution CMIP6 simulations. Results will be shown from evaluation runs of BCDP deployed on a Kubernetes cluster running on Amazon Elastic Kubernetes Service (EKS).
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2020
- Bibcode:
- 2020AGUFMIN033..07G
- Keywords:
-
- 3360 Remote sensing;
- ATMOSPHERIC PROCESSES;
- 1626 Global climate models;
- GLOBAL CHANGE;
- 1920 Emerging informatics technologies;
- INFORMATICS;
- 1932 High-performance computing;
- INFORMATICS