Cloud Giovanni: Reining in Costs and Improving Performance with Analytical Data Stores using Scalable Serverless Architecture
Abstract
Giovanni is the Geospatial Interactive Online Visualization ANd aNalysis Infrastructure developed at NASA's GES DISC which provides a simple and intuitive way to visualize, analyze, and access vast amounts of Earth science data. It receives large number of user requests each day for a variety of analysis and visualization services, which leads to the "big data" challenge of serving gradually increasing large data volumes with diverse statistical algorithms. We hereby propose a multi-dimensional accumulation method which provides fast and cost-efficient cloud analysis for diverse services including both area averaging and time averaging. This method involves the weighted volume integration over multiple variable dimensions (time and space), and is implemented in AWS using Athena to provide serverless and highly scalable data analysis. Compared to the standard method, this approach dramatically reduces the computational time by order of magnitude with a minimal AWS cost incurred. For example, for a benchmark of 10-year area averaging over the 1x1 degree daily variable, the computational time is reduced from minutes to seconds, and the computational cost is only $5 for 100,000 requests.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2019
- Bibcode:
- 2019AGUFMIN13B0708Z
- Keywords:
-
- 1908 Cyberinfrastructure;
- INFORMATICS;
- 1926 Geospatial;
- INFORMATICS;
- 1942 Machine learning;
- INFORMATICS;
- 1996 Web Services;
- INFORMATICS