Compression Support in NetCDF Zarr
Abstract
The Unidata NetCDF group has extended the netcdf-c library to provide access to cloud storage using the Zarr data and storage model. This extension provides a mapping from a subset of the full netCDF Enhanced (aka netCDF-4) data model to Zarr. This allows programs to use the NetCDF API to read and write data using the Zarr storage format, including data stored in cloud storage such as Amazon S3. Data compression is essential for improved space and time performance of data access, especially for accessing remote cloud storage. NetCDF currently supports compression based on the HDF5 compression representation, but Zarr uses a different representation. So the problem to be solved is how to unify Zarr compression with HDF5 compression. Two constraints are imposed on the unification. First, HDF5 compressor information must be representable in Zarr format and vice-versa. Second, HDF5 compressor implementations must be re-used at run-time in order to avoid having to rewrite the compressor implementations. For both HDF5 and Zarr, compression metadata is associated with an array. This metadata specifies the compression algorithm id and a set of parameters. HDF5 uses integer values for both the id and the parameters. Zarr uses a string formatted as a JSON dictionary, with the id and parameters stored as dictionary keys. The chosen solution involves extending the HDF5 approach so that compressor implementations include extra, Zarr specific functions for converting filter data between the HDF5 integer representation and the Zarr JSON representation. HDF5 implements each compressor as a shared library that can be dynamically loaded at run-time. These libraries export well-known entry points from which the netcdf-c library can extract information needed to utilize the compressor. The unified solution extends these libraries to export additional entry points that provide Zarr related information to support the conversions between HDF5 and Zarr. A unique feature of this solution is that the Zarr conversion implementation can be in the same shared library as the HDF5 compressor implementation , or the Zarr implementation can be in an entirely separate shared library. This latter case supports incrementally extending Zarr support for new filters without having to modify the existing HDF5 compressor implementations.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2021
- Bibcode:
- 2021AGUFMIN35D0418H