Developing and Adapting Data Management Services Across Multiple Virtual Observatories
Abstract
Whether it is data embedded in images from drones or field scanners, sensors in the field, results of lab analyses, remote sensing data, or external data providers, modern virtual observatories require a wide variety of features to support cross-cutting research. We have developed two software frameworks to support a wide variety of data management practices. Clowder provides a cloud-based framework to store, curate and analyze large amounts of data and metadata. The Geostreaming Data Framework provides a JSON-based geo-temporal web service API, a web application to visualize, search and download the data, and data parsing software libraries to normalize the data from different data sources into one common flexible schema.
The two frameworks have been deployed, improved and integrated to support five virtual observatories. Great Lakes Monitoring provides easy access to environmental monitoring data collected throughout the Great Lakes by the U.S. EPA and other agencies. Great Lakes to Gulf Virtual Observatory collects water quality monitoring data aggregated from multiple sources along the Mississippi River and its tributaries concerning excess nutrient and hypoxia in the Gulf of Mexico. Intensively Managed Landscapes Critical Zone Observatory collects environmental data at three sites in Illinois, Iowa, and Minnesota with the aim to understand the short-term and long-term resilience of the crucial ecological, hydrological, and climatic services provided by the Critical Zone. TERRA-REF catalogs the output of the Lemnatec Field Scanalyzer in Arizona, the largest high-throughput phenotyping field-scanning robot in the world. Vector Borne Disease uses West Nile infection rates of mosquito traps along with weather data to predict future mosquito infection rates. We describe how the data pipelines for these five use cases differ. We highlight how one software solution rarely fits all use cases and how it is important to build open systems with extensibility and flexibility in mind. We describe how our architectures leverage different technologies, standards and services, including Open Geospatial Consortium Standards, Geoserver, PostGIS, Globus Transfer and JSON-LD, to support different use cases and data formats.- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2018
- Bibcode:
- 2018AGUFMIN53C0633K
- Keywords:
-
- 1916 Data and information discovery;
- INFORMATICSDE: 1930 Data and information governance;
- INFORMATICSDE: 1946 Metadata;
- INFORMATICSDE: 1976 Software tools and services;
- INFORMATICS