The Geo Data Portal an Example Physical and Application Architecture Demonstrating the Power of the "Cloud" Concept.
Abstract
The U.S. Geological Survey Center for Integrated Data Analytics (CIDA), in holding with the President's Digital Government Strategy and the Department of Interior's IT Transformation initiative, has evolved its data center and application architecture toward the "cloud" paradigm. In this case, "cloud" refers to a goal of developing services that may be distributed to infrastructure anywhere on the Internet. This transition has taken place across the entire data management spectrum from data center location to physical hardware configuration to software design and implementation. In CIDA's case, physical hardware resides in Madison at the Wisconsin Water Science Center, in South Dakota at the Earth Resources Observation and Science Center (EROS), and in the near future at a DOI approved commercial vendor. Tasks normally conducted on desktop-based GIS software with local copies of data in proprietary formats are now done using browser-based interfaces to web processing services drawing on a network of standard data-source web services. Organizations are gaining economies of scale through data center consolidation and the creation of private cloud services as well as taking advantage of the commoditization of data processing services. Leveraging open standards for data and data management take advantage of this commoditization and provide the means to reliably build distributed service based systems. This presentation will use CIDA's experience as an illustration of the benefits and hurdles of moving to the cloud. Replicating, reformatting, and processing large data sets, such as downscaled climate projections, traditionally present a substantial challenge to environmental science researchers who need access to data subsets and derived products. The USGS Geo Data Portal (GDP) project uses cloud concepts to help earth system scientists' access subsets, spatial summaries, and derivatives of commonly needed very large data. The GDP project has developed a reusable architecture and advanced processing services that currently accesses archives hosted at Lawrence Livermore National Lab, Oregon State University, the University Corporation for Atmospheric Research, and the U.S. Geological Survey, among others. Several examples of how the GDP project uses cloud concepts will be highlighted in this presentation: 1) The high bandwidth network connectivity of large data centers reduces the need for data replication and storage local to processing services. 2) Standard data serving web services, like OPeNDAP, Web Coverage Services, and Web Feature Services allow GDP services to remotely access custom subsets of data in a variety of formats, further reducing the need for data replication and reformatting. 3) The GDP services use standard web service APIs to allow browser-based user interfaces to run complex and compute-intensive processes for users from any computer with an Internet connection. The combination of physical infrastructure and application architecture implemented for the Geo Data Portal project offer an operational example of how distributed data and processing on the cloud can be used to aid earth system science.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2012
- Bibcode:
- 2012AGUFMIN33B1538B
- Keywords:
-
- 1904 INFORMATICS / Community standards;
- 1936 INFORMATICS / Interoperability;
- 1940 INFORMATICS / Machine-to-machine communication;
- 1982 INFORMATICS / Standards