Using Serverless Architecture for Earth Science Data Analytics in the Cloud
Abstract
Giovanni (https://giovanni.gsfc.nasa.gov/giovanni/) is a popular online data exploration tool at the NASA Goddard Earth Sciences Data Information Services Center (GES DISC), providing 22 analysis and visualization services for over 1900 Earth Science data variables. In the Cloud-based Giovanni, built using Amazon Web Services (AWS), we evaluated (1) AWS native solutions to provide a scalable, serverless architecture; (2) open standards for data storage in the Cloud; (3) end-user performance. The architecture is based on core best practices of NASA Earth Science Data and Information Systems (ESDIS) Cloud Reference Architecture with emphasis on exposing and consuming services and using analysis-ready data.
For the prototype, we selected a resource-intensive service in Giovanni, area-averaged time series, which is consistently popular with users. We designed a solution to pre-process data using aggregation in spatial dimensions to produce analysis-ready data stored in Apache Parquet format. The analysis-ready data were exposed through AWS Athena, a serverless query solution. Compared to on-premises Giovanni, tests indicate high performance gain (300x) and extremely low cost for computing. We will present Giovanni benchmarks for the test cases used in the study on Cloud Analytics of Earth Observations conducted by ESDIS along with the pre-processing techniques used to create analysis-ready data.- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2018
- Bibcode:
- 2018AGUFMIN51B0577H
- Keywords:
-
- 1908 Cyberinfrastructure;
- INFORMATICSDE: 1920 Emerging informatics technologies;
- INFORMATICSDE: 1932 High-performance computing;
- INFORMATICSDE: 1976 Software tools and services;
- INFORMATICS