Versioning of Research Data - Patterns and Principles
Abstract
In data-driven research, it is becoming increasingly important for a researcher to be able to cite the exact extract of the data set that was used to underpin their research publication.
The need for best practices in data versioning was recognised by the Research Data Alliance (RDA) and the current state explored through an RDA Interest Group. The work published by the RDA Dynamic Data Citation Working Group, by the W3C Dataset Exchange Working Group, as well as the work in other groups on data provenance and data citation, highlighted that definitions of data versioning concepts and recommended practices were still missing. Over the past two years, the RDA Data Versioning Working Group and its precursor Interest Group collected numerous use cases of data versioning practices and extracted data versioning patterns. A key element that emerged from the analysis of the versioning use cases was the necessary distinction between revision, release, and manifestation of a dataset. Dataset revisions refer to changes in the bitstream. Whenever a dataset is changed the resulting changes are considered to be a revision. Tracking revisions is a technical process that may document the magnitude of the change but does not convey the significance of the change. A dataset may undergo several revisions before it is considered to be "final" and is subsequently published as a data release. The significance of changes in the new release will depend on the impact these changes have on the designated user community. Some datasets are published in different formats or encodings but are equivalent in their content. Following the model of Functional Requirements for Bibliographic Records (FRBR), both datasets can be seen as manifestations of the same intellectual work. The three cases outlined above are currently subsumed under the term of "version", yet all three cases represent unique patterns and require different treatment with respect to identification, publication and citation.- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2019
- Bibcode:
- 2019AGUFMIN23D0892D
- Keywords:
-
- 1904 Community standards;
- INFORMATICS;
- 1910 Data assimilation;
- integration and fusion;
- INFORMATICS;
- 1912 Data management;
- preservation;
- rescue;
- INFORMATICS;
- 1916 Data and information discovery;
- INFORMATICS