Experiences and lessons learned from creating a generalized workflow for data publication of field campaign datasets
Abstract
This paper summarizes the workflow challenges of curating and publishing data produced from disparate data sources and provides a generalized workflow solution to efficiently archive data generated by researchers. The Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC) for biogeochemical dynamics and the Global Hydrology Resource Center (GHRC) DAAC have been collaborating on the development of a generalized workflow solution to efficiently manage the data publication process. The generalized workflow presented here are built on lessons learned from implementations of the workflow system. Data publication consists of the following steps:
Accepting the data package from the data providers, ensuring the full integrity of the data files. Identifying and addressing data quality issues Assembling standardized, detailed metadata and documentation, including file level details, processing methodology, and characteristics of data files Setting up data access mechanisms Setup of the data in data tools and services for improved data dissemination and user experience Registering the dataset in online search and discovery catalogues Preserving the data location through Digital Object Identifiers (DOI) We will describe the steps taken to automate, and realize efficiencies to the above process. The goals of the workflow system are to reduce the time taken to publish a dataset, to increase the quality of documentation and metadata, and to track individual datasets through the data curation process. Utilities developed to achieve these goal will be described. We will also share metrics driven value of the workflow system and discuss the future steps towards creation of a common software framework.- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2017
- Bibcode:
- 2017AGUFMIN43A0062S
- Keywords:
-
- 1904 Community standards;
- INFORMATICS;
- 1912 Data management;
- preservation;
- rescue;
- INFORMATICS;
- 1978 Software re-use;
- INFORMATICS;
- 6610 Funding;
- PUBLIC ISSUES