Using cloud-friendly data format in earth system models
Abstract
Data volume of the earth system model is increasing rapidly every year, thanks to advances in computing efficiency and increases in model resolution and complexity. To efficiently produce and manage these data volumes, both parallel Input/Output (I/O) and data compression is necessary. Unfortunately, the commonly used data format (NetCDF) in earth system model does not support the writing of compressed output files in parallel without a significant cost overhead. Alternatively, Zarr is a cloud-friendly data format that provides an implementation of parallel (multiple threads or processes) I/O for chunked, compressed, N-dimensional arrays. We created C wrappers of Z5 library which is a C++ implementation of Zarr, then integrated it into an application level parallel I/O library used by the Community Earth System Model. Our work provided insights as to whether Zarr/Z5 can improve the parallel read/write/compression performance in the earth system model on traditional storage systems versus the existing NetCDF formats. We will describe and analyze our results and discuss the progress towards our ultimate goal of running CESM simulation in clouds and outputting cloud storage friendly data format.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2019
- Bibcode:
- 2019AGUFMIN13C0728X
- Keywords:
-
- 1626 Global climate models;
- GLOBAL CHANGE;
- 1920 Emerging informatics technologies;
- INFORMATICS;
- 1932 High-performance computing;
- INFORMATICS;
- 1994 Visualization and portrayal;
- INFORMATICS