What Interdisciplinary Research, Heterogeneous Data, and Netflix have in common: Leveraging Open Data and the Cloud to Increase Data Access and Use with Ephemeral Archives of Convenience
Abstract
Earth Science data is measured in petabytes and represents decades of data collection, evolution of technology and practices, and provides an unparalleled view of our planet. The pace of change is only accelerating: NASA and other agencies are on their way to making hundreds of Petabytes of data available in the cloud, highly scalable processing and analysis architectures and tools are in active use with more being developed every day, and each of these brings with it opportunities for optimization and innovation. Earth science data is a source of critical information for monitoring smoke, flooding impacts, burn scars, volcanic ash, and weather; however, finding and using this data can require significant investment.
The specific tools suite chosen by researchers should be what best supports their use case. There are a variety of tools (Hadoop, Redshift, Athena, Pangeo, SageMaker, SciKitLearn, etc.), data formats (GeoTIFF, Cloud Optimized GeoTIFFs, HDF, CSV, GRIB, Zarr, etc.), and data providers (NASA, NOAA, ESA, Digital Globe, Planet) each bringing use case specific value and optimizations. This talk presents Element 84's approach to addressing this problem by creating a scalable, cloud based processing pipeline in AWS that creates ephermal, analysis ready, heterogenous Archives of Convenience and ephemeral services making open data available through Pangeo, OPeNDAP, WMS, and desktop tools. As a concrete example of use case specific data transformation, we'll demonstrate leveraging video compression and streaming formats in the processing pipeline to make the entire NOAA GOES-16 archive interactive. Users can now easily identify dates of interest for events like natural disasters, and stage a subset of NOAA, ESA, and USGS archives for analysis.- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2018
- Bibcode:
- 2018AGUFMIN52A..08P
- Keywords:
-
- 3360 Remote sensing;
- ATMOSPHERIC PROCESSESDE: 1910 Data assimilation;
- integration and fusion;
- INFORMATICSDE: 1920 Emerging informatics technologies;
- INFORMATICSDE: 1926 Geospatial;
- INFORMATICS