Physical Samples Linked Data in Action
Abstract
Most data and metadata related to physical samples currently reside in isolated relational databases driven by diverse data models. How to approach the challenge for sharing, interchanging and integrating data from these difference relational databases motivated us to publish Linked Open Data for collections of physical samples, using Semantic Web technologies including the Resource Description Framework (RDF), RDF Query Language (SPARQL), and Web Ontology Language (OWL). In last few years, we have released four knowledge graphs concentrated on physical samples, including System for Earth Sample Registration (SESAR), USGS National Geochemical Database (NGDC), Ocean Biogeographic Information System (OBIS), and Earthchem Database. Currently the four knowledge graphs contain over 12 million facets (triples) about objects of interest to the geoscience domain. Choosing appropriate domain ontologies for representing context of data is the core of the whole work. Geolink ontology developed by Earthcube Geolink project was used as top level to represent common concepts like person, organization, cruise, etc. Physical sample ontology developed by Interdisciplinary Earth Data Alliance (IEDA) and Darwin Core vocabulary were used as second level to describe details about geological samples and biological diversity. We also focused on finding and building best tool chains to support the whole life cycle of publishing linked data we have, including information retrieval, linked data browsing and data visualization. Currently, Morph, Virtuoso Server, LodView, LodLive, and YASGUI were employed for converting, storing, representing, and querying data in a knowledge base (RDF triplestore). Persistent digital identifier is another main point we concentrated on. Open Researcher & Contributor IDs (ORCIDs), International Geo Sample Numbers (IGSNs), Global Research Identifier Database (GRID) and other persistent identifiers were used to link different resources from various graphs with person, sample, organization, cruise, etc. This work is supported by the EarthCube "GeoLink" project (NSF# ICER14-40221 and others) and the "USGS-IEDA Partnership to Support a Data Lifecycle Framework and Tools" project (USGS# G13AC00381).
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2017
- Bibcode:
- 2017AGUFMIN33B0126J
- Keywords:
-
- 1916 Data and information discovery;
- INFORMATICS;
- 1930 Data and information governance;
- INFORMATICS;
- 1970 Semantic web and semantic integration;
- INFORMATICS;
- 1986 Statistical methods: Inferential;
- INFORMATICS