Rolling Deck to Repository (R2R): Organizing Datasets from Heterogeneous Shipboard Data into an Integrated Catalog
Abstract
The goal of the Rolling Deck to Repository (R2R; rvdata.us) program is to develop and implement a fleet-wide information management system to preserve and provide access to routine underway data collected by U.S. academic research vessels. One of the program’s primary challenges is to develop a workflow for routinely gathering data from a fleet with heterogeneous instrumentation and file systems, breaking out the data into discrete data sets, and transferring the data to the appropriate National Data Center. Because R2R expects to receive 300-400 cruises per year from a wide range of vessel classes and at least two dozen major device types, this workflow must be highly automated. R2R has developed and implemented a workflow for processing underway data: - Collection: data are provided at the end of a cruise as a cruise distribution using the mechanism preferred by the vessel operator. - Inventory: a complete inventory of each cruise distribution is created, listing the filename, date, size, and checksum. This inventory is published online by R2R. - Raw Archive: the entire original cruise distribution is transmitted securely to the National Geophysical Data Center for deep archive. - Data Breakout: data from each device are extracted from the cruise distribution according to a vessel profile yielding a discrete data set. Once a data set is broken out and any proprietary holds are cleared, it is delivered to the appropriate National Data Center for archiving and dissemination. It may also be processed further for quality assessment/quality control and/or publication of standard products. Data sets sent to a National Data Center will have unique and persistent R2R identifiers, which link to parent cruise information on the R2R site. The site will include the URL link to each data set at the National Data Center, allowing a user browsing the R2R cruise catalog to download it. Because each vessel operator typically has a customized system for naming and organizing data sets, as well as an individual set of instruments, the mapping between the cruise directory structure and the set of standard data types is accomplished through a vessel profile. The vessel profile holds information on the device type, make, model, and (optionally) location of each instrument, plus information on how those files are represented in the cruise distribution. The device list is supported by a controlled vocabulary of major underway device types. This approach offers a balance between supporting the historical heterogeneity and individual needs of the vessel operators, and providing a scalable largely-automated breakout process to populate an integrated cruise catalog. To better facilitate automation, R2R is working with vessel operators to develop a standard directory structure that could be adopted across the UNOLS fleet and provide a standard way to represent data. Implementing this structure would decrease the time-intensive process of mapping between each vessel’s unique cruise distribution structure and the standard device types.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2010
- Bibcode:
- 2010AGUFMIN41B1365C
- Keywords:
-
- 1912 INFORMATICS / Data management;
- preservation;
- rescue;
- 1998 INFORMATICS / Workflow;
- 4260 OCEANOGRAPHY: GENERAL / Ocean data assimilation and reanalysis