Collaborative Science Using Web Services and the SciFlo Grid Dataflow Engine
Abstract
The General Earth Science Investigation Suite (GENESIS) project is a NASA-sponsored partnership between the Jet Propulsion Laboratory, academia, and NASA data centers to develop a new suite of Web Services tools to facilitate multi-sensor investigations in Earth System Science. The goal of GENESIS is to enable large-scale, multi-instrument atmospheric science using combined datasets from the AIRS, MODIS, MISR, and GPS sensors. Investigations include cross-comparison of spaceborne climate sensors, cloud spectral analysis, study of upper troposphere-stratosphere water transport, study of the aerosol indirect cloud effect, and global climate model validation. The challenges are to bring together very large datasets, reformat and understand the individual instrument retrievals, co-register or re-grid the retrieved physical parameters, perform computationally-intensive data fusion and data mining operations, and accumulate complex statistics over months to years of data. To meet these challenges, we have developed a Grid computing and dataflow framework, named SciFlo, in which we are deploying a set of versatile and reusable operators for data access, subsetting, registration, mining, fusion, compression, and advanced statistical analysis. SciFlo leverages remote Web Services, called via Simple Object Access Protocol (SOAP) or REST (one-line) URLs, and the Grid Computing standards (WS-* &Globus Alliance toolkits), and enables scientists to do multi-instrument Earth Science by assembling reusable Web Services and native executables into a distributed computing flow (tree of operators). The SciFlo client &server engines optimize the execution of such distributed data flows and allow the user to transparently find and use datasets and operators without worrying about the actual location of the Grid resources. In particular, SciFlo exploits the wealth of datasets accessible by OpenGIS Consortium (OGC) Web Mapping Servers & Web Coverage Servers (WMS/WCS), and by Open Data Access Protocol (OpenDAP) servers. The scientist injects a distributed computation into the Grid by simply filling out an HTML form or directly authoring the underlying XML dataflow document, and results are returned directly to the scientist's desktop. Once an analysis has been specified for a chunk or day of data, it can be easily repeated with different control parameters or over months of data. Recently, the Earth Science Information Partners (ESIP) Federation sponsored a collaborative activity in which several ESIP members advertised their respective WMS/WCS and SOAP services, developed some collaborative science scenarios for atmospheric and aerosol science, and then choreographed services from multiple groups into demonstration workflows using the SciFlo engine and a Business Process Execution Language (BPEL) workflow engine. For several scenarios, the same collaborative workflow was executed in three ways: using hand-coded scripts, by executing a SciFlo document, and by executing a BPEL workflow document. We will discuss the lessons learned from this activity, the need for standardized interfaces (like WMS/WCS), the difficulty in agreeing on even simple XML formats and interfaces, and further collaborations that are being pursued.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2006
- Bibcode:
- 2006AGUFMIN42A..03W
- Keywords:
-
- 0300 ATMOSPHERIC COMPOSITION AND STRUCTURE;
- 1616 Climate variability (1635;
- 3305;
- 3309;
- 4215;
- 4513)