Hybrid Pluggable Processing Pipeline (HyP3): A cloud-based infrastructure for generic processing of SAR data
Abstract
A problem often faced by Earth science researchers is how to scale algorithms that were developed against few datasets and take them to regional or global scales. One significant hurdle can be the processing and storage resources available for such a task, not to mention the administration of those resources. As a processing environment, the cloud offers nearly unlimited potential for compute and storage, with limited administration required. The goal of the Hybrid Pluggable Processing Pipeline (HyP3) project was to demonstrate the utility of the Amazon cloud to process large amounts of data quickly and cost effectively, while remaining generic enough to incorporate new algorithms with limited administration time or expense. Principally built by three undergraduate students at the ASF DAAC, the HyP3 system relies on core Amazon services such as Lambda, the Simple Notification Service (SNS), Relational Database Service (RDS), Elastic Compute Cloud (EC2), Simple Storage Service (S3), and Elastic Beanstalk. The HyP3 user interface was written using elastic beanstalk, and the system uses SNS and Lamdba to handle creating, instantiating, executing, and terminating EC2 instances automatically. Data are sent to S3 for delivery to customers and removed using standard data lifecycle management rules. In HyP3 all data processing is ephemeral; there are no persistent processes taking compute and storage resources or generating added cost. When complete, HyP3 will leverage the automatic scaling up and down of EC2 compute power to respond to event-driven demand surges correlated with natural disaster or reprocessing efforts. Massive simultaneous processing within EC2 will be able match the demand spike in ways conventional physical computing power never could, and then tail off incurring no costs when not needed. This presentation will focus on the development techniques and technologies that were used in developing the HyP3 system. Data and process flow will be shown, highlighting the benefits of the cloud for each step. Finally, the steps for integrating a new processing algorithm will be demonstrated. This is the true power of HyP3; allowing people to upload their own algorithms and execute them at archive level scales.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2016
- Bibcode:
- 2016AGUFMIN21B1740H
- Keywords:
-
- 1908 Cyberinfrastructure;
- INFORMATICSDE: 1910 Data assimilation;
- integration and fusion;
- INFORMATICSDE: 1920 Emerging informatics technologies;
- INFORMATICSDE: 1964 Real-time and responsive information delivery;
- INFORMATICS