Semantic Approaches to Enhancing Data Findability and Interoperability in the NSF DataONE and Arctic Data Center Data Repositories
Abstract
Reproducibility and synthesis of scientific insights vitally depend on researchers having access to other researchers' data. The preservation of data in robust, sustained repositories represents a major step forward for investigators to access relevant information. However, environmental science research often requires data from disparate disciplines and sources, including climatological, geographical, ecological, biodiversity, and genomic data-- and more. In addition, environmental science researchers use their own specialized terms and abbreviations to describe their processes and measurements, further complicating data discovery and interpretation. Significant challenges thus exist for researchers to find and re-use others' data, especially in the case of synthesis efforts, where the data must also be harmonized and integrated.
Within the NSF-sponsored DataONE (http://dataone.org) and Arctic Data Center (http://arcticdata.io) data systems, we have enhanced a well-established XML schema-based metadata standard, Ecological Metadata Language (EML; https://github.com/NCEAS/eml) with the capability to reference external controlled vocabularies, i.e., ontologies. By linking metadata fields and their contents to terms contained in well-constructed ontologies, this semantic annotation enables several advanced data services—including clarification of a dataset descriptor's specific, finer, and broader meanings, and its relationship to other terms. Our services can now resolve synonyms and homonyms, and execute query expansions to include relevant terms from sub-classes and their instances. This leads to a significant increase in the findability and reusability of data retrieved from participating repositories. We describe how semantic annotation enhanced FAIR principles within our own cyberinfrastructures, and may be a readily implementable way to confederate search across multiple scientific data repositories. The possibilities of this approach recommend, however, convergence of the earth and environmental science communities on ways to construct their vocabularies. So we will also touch on our experiences examining ontologies, and discuss the benefits of consensus on how to deploy features of W3C-recommended semantic web languages such as RDF, SKOS, and OWL.- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2019
- Bibcode:
- 2019AGUFMIN22C..19S
- Keywords:
-
- 0485 Science policy;
- BIOGEOSCIENCES;
- 1910 Data assimilation;
- integration and fusion;
- INFORMATICS;
- 1924 Formal logics and grammars;
- INFORMATICS;
- 1970 Semantic web and semantic integration;
- INFORMATICS