SciFlo: Semantically-Enabled Grid Workflow for Collaborative Science
Abstract
SciFlo is a system for Scientific Knowledge Creation on the Grid using a Semantically-Enabled Dataflow Execution Environment. SciFlo leverages Simple Object Access Protocol (SOAP) Web Services and the Grid Computing standards (WS-* standards and the Globus Alliance toolkits), and enables scientists to do multi-instrument Earth Science by assembling reusable SOAP Services, native executables, local command-line scripts, and python codes into a distributed computing flow (a graph of operators). SciFlo's XML dataflow documents can be a mixture of concrete operators (fully bound operations) and abstract template operators (late binding via semantic lookup). All data objects and operators can be both simply typed (simple and complex types in XML schema) and semantically typed using controlled vocabularies (linked to OWL ontologies such as SWEET). By exploiting ontology-enhanced search and inference, one can discover (and automatically invoke) Web Services and operators that have been semantically labeled as performing the desired transformation, and adapt a particular invocation to the proper interface (number, types, and meaning of inputs and outputs). The SciFlo client & server engines optimize the execution of such distributed data flows and allow the user to transparently find and use datasets and operators without worrying about the actual location of the Grid resources. The scientist injects a distributed computation into the Grid by simply filling out an HTML form or directly authoring the underlying XML dataflow document, and results are returned directly to the scientist's desktop. A Visual Programming tool is also being developed, but it is not required. Once an analysis has been specified for a granule or day of data, it can be easily repeated with different control parameters and over months or years of data. SciFlo uses and preserves semantics, and also generates and infers new semantic annotations. Specifically, the SciFlo engine uses semantic metadata to understand (infer) what it is doing and potentially improve the data flow; preserves semantics by saving links to the semantics of (metadata describing) the input datasets, related datasets, and the data transformations (algorithms) used to generate downstream products; generates new metadata by allowing the user to add semantic annotations to the generated data products (or simply accept automatically generated provenance annotations); and infers new semantic metadata by understanding and applying logic to the semantics of the data and the transformations performed. Much ontology development still needs to be done but, nevertheless, SciFlo documents provide a substrate for using and preserving more semantics as ontologies develop. We will give a live demonstration of the growing SciFlo network using an example dataflow in which atmospheric temperature and water vapor profiles from three Earth Observing System (EOS) instruments are retrieved using SOAP (geo-location query & data access) services, co-registered, and visually & statistically compared on demand (see http://sciflo.jpl.nasa.gov for more information).
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2005
- Bibcode:
- 2005AGUFMIN43A0322Y
- Keywords:
-
- 0300 ATMOSPHERIC COMPOSITION AND STRUCTURE;
- 0350 Pressure;
- density;
- and temperature;
- 1610 Atmosphere (0315;
- 0325);
- 1640 Remote sensing (1855);
- 3305 Climate change and variability (1616;
- 1635;
- 3309;
- 4215;
- 4513)