NCAR CESM Large Ensemble Numerical Simulation Data on AWS
Abstract
The National Center for Atmospheric Research (NCAR), in collaboration with Amazon Web Services (AWS) Public Datasets Program and the Amazon Sustainability Data Initiative (ASDI), has made available ~100 TB of cloud-optimized, analysis-ready data from the NCAR Community Earth System Model (CESM) Large Ensemble (LENS) in AWS S3. The data are freely available for research, education, and commercial purposes.
CESM LENS includes a 40-member ensemble of climate simulations using historical observations for the period 1920-2005 and assuming the RCP8.5 greenhouse gas concentration scenario for 2005-2100. The data comprise both surface (2D) and volumetric (3D) variables in the atmosphere, ocean, land, and sea-ice domains, with monthly, daily, and 6-hour time resolutions. LENS data have traditionally been publicly available through the NCAR Climate Data Gateway for NetCDF file download or via web services, and authorized users of the NCAR high-performance computing (HPC) resources have also been able to analyze the data without downloading. However, the size of the dataset, the use of magnetic tape storage for a significant portion, and restricted access to NCAR HPC, have made large-scale use and analysis of these data difficult, particularly for higher time resolutions. NCAR has copied a substantial subset of CESM LENS data to Amazon S3, focusing on the most useful fields. To optimize the performance of large-scale analytics we have structured the data in the Cloud according to the Zarr storage specification as chunks ~100 MB in size which can be read in parallel using Dask. Virtual datasets aggregating multiple Zarr stores can be accessed through the Python Xarray library. To encourage computing in place rather than data downloads, we have made available a Jupyter Notebook providing examples of basic analyses. The goals of this project include enabling access and cloud-based analysis by a broader community, assessing the performance of the Zarr chunking approach, and evaluating the feasibility of lossy compression using bit grooming or other methods. This presentation will report on the current status and results of the ongoing project, and invite feedback and collaboration by interested parties. The LENS data can be accessed via https://doi.org/10.26024/wt24-5j82 .- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2019
- Bibcode:
- 2019AGUFMIN12A..08D
- Keywords:
-
- 1626 Global climate models;
- GLOBAL CHANGE;
- 1920 Emerging informatics technologies;
- INFORMATICS;
- 1932 High-performance computing;
- INFORMATICS;
- 1994 Visualization and portrayal;
- INFORMATICS