Progress on big data publication and documentation for machine-to-machine discovery, access, and processing

Progress on big data publication and documentation for machine-to-machine discovery, access, and processing

High-resolution data for use in environmental modeling is increasingly becoming available at broad spatial and temporal scales. Downscaled climate projections, remotely sensed landscape parameters, and land-use/land-cover projections are examples of datasets that may exceed an individual investigation's data management and analysis capacity. To allow projects on limited budgets to work with many of these data sets, the burden of working with them must be reduced. The approach being pursued at the U.S. Geological Survey Center for Integrated Data Analytics uses standard self-describing web services that allow machine to machine data access and manipulation. These techniques have been implemented and deployed in production level server-based Web Processing Services that can be accessed from a web application or scripted workflow. Data publication techniques that allow machine-interpretation of large collections of data have also been implemented for numerous datasets at U.S. Geological Survey data centers as well as partner agencies and academic institutions. Discovery of data services is accomplished using a method in which a machine-generated metadata record holds content--derived from the data's source web service--that is intended for human interpretation as well as machine interpretation. A distributed search application has been developed that demonstrates the utility of a decentralized search of data-owner metadata catalogs from multiple agencies. The integrated but decentralized system of metadata, data, and server-based processing capabilities will be presented. The design, utility, and value of these solutions will be illustrated with applied science examples and success stories. Datasets such as the EPA's Integrated Climate and Land Use Scenarios, USGS/NASA MODIS derived land cover attributes, and downscaled climate projections from several sources are examples of data this system includes. These and other datasets, have been published as standard, self-describing, web services that provide the ability to inspect and subset the data. This presentation will demonstrate this file-to-web service concept and how it can be used from script-based workflows or web applications.

Publication:

AGU Fall Meeting Abstracts

Pub Date:

December 2013

Bibcode:

2013AGUFMIN31C1522W

Keywords:

1916 INFORMATICS Data and information discovery;
1996 INFORMATICS Web Services

NASA/ADS

Progress on big data publication and documentation for machine-to-machine discovery, access, and processing

No Sources Found