The Kiel data management infrastructure - arising from a generic data model
Abstract
The Kiel Data Management Infrastructure (KDMI) started from a cooperation of three large-scale projects (SFB574, SFB754 and Cluster of Excellence The Future Ocean) and the Leibniz Institute of Marine Sciences (IFM-GEOMAR). The common strategy for project data management is a single person collecting and transforming data according to the requirements of the targeted data center(s). The intention of the KDMI cooperation is to avoid redundant and potentially incompatible data management efforts for scientists and data managers and to create a single sustainable infrastructure. An increased level of complexity in the conceptual planing arose from the diversity of marine disciplines and approximately 1000 scientists involved. KDMI key features focus on the data provenance which we consider to comprise the entire workflow from field sampling thru labwork to data calculation and evaluation. Managing the data of each individual project participant in this way yields the data management for the entire project and warrants the reusability of (meta)data. Accordingly scientists provide a workflow definition of their data creation procedures resulting in their target variables. The central idea in the development of the KDMI presented here is based on the object oriented programming concept which allows to have one object definition (workflow) and infinite numbers of object instances (data). Each definition is created by a graphical user interface and produces XML output stored in a database using a generic data model. On creation of a data instance the KDMI translates the definition into web forms for the scientist, the generic data model then accepts all information input following the given data provenance definition. An important aspect of the implementation phase is the possibility of a successive transition from daily measurement routines resulting in single spreadsheet files with well known points of failure and limited reuseability to a central infrastructure as a single point of truth. The data provenance approach has the following positive side effects: (1) the scientist designs the extend and timing of data and metadata prompts by workflow definitions himself while (2) consistency and completeness (mandatory information) of metadata in the resulting XML document can be checked by XML validation. (3) Storage of the entire data creation process (including raw data and processing steps) provides a multidimensional quality history accessible by all researchers in addition to the commonly applied one dimensional quality flag system. (4) The KDMI can be extended to other scientific disciplines by adding new workflows and domain specific outputs assisted by the KDMI-Team. The KDMI is a social network inspired system but instead of sharing privacy it is a sharing platform for daily scientific work, data and their provenance.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2010
- Bibcode:
- 2010AGUFMIN43C..02F
- Keywords:
-
- 1912 INFORMATICS / Data management;
- preservation;
- rescue;
- 1930 INFORMATICS / Data and information governance;
- 1948 INFORMATICS / Metadata: Provenance;
- 1998 INFORMATICS / Workflow