Geoscientific Data Integration Occurs at the Workflow Layer, Requiring FAIR for all Aspects of an Analysis Workflow and More Data Repository Integration: an EarthCube view.
Abstract
Modern scientific workflows have improved dramatically by adopting digital tools to record, describe, compile, archive and analyze data in many Geosciences subdomains. In the arena of US federally-funded geoscience research, while there is general agreement on the need for secure/persistent data repositories to archive and advance our science through data reuse, there is no agreed on path for methodologies, structures, descriptors and physical infrastructure leading to cross-domain geoscience data access for earth-systems research. Instead, a series of domain specific repositories, some decades old and with variable funding and sustainability models, have developed, generally providing targeted data services and tools to their user groups in an effective yet siloed way. In the absence of a coherent open data standard, cross-domain and multi-domain science workflows based on the simultaneous sharing and use of data across multiple repositories is very difficult to do. While some repos have moved to adopt the recently proposed FAIR guidelines, the landscape is uneven, and adoption is slow in an environment of limited financial resources. Yet even if every major Geoscience data repository were FAIR, the difficulty for scientists to query and use data across these systems, let alone to use and analyze the data together, demonstrates a critically unaddressed data infrastructure need. Substantial investments in infrastructure and standards adoption are needed. In short, every piece of a scientific work flow, including data resources and analysis tools, should also interoperate under FAIR. As an interim solution, Through GeoCODES, EarthCube has started to register information about US data center holdings and data analysis tools, without requiring too drastic of change at individual repos, in an attempt to weave an Interoperability layer on top of this federated, domain-tailored data network, primarily through the application of cloud-implementable data markup and exchange strategies. EarthCube also plans to enable workflow support via a tool integration platform, allowing multiple and flexible access points to discover, obtain, and use heterogeneous geoscience data together. But more work is needed to move us to a more coherent, streamlined, interoperable, and open geoscience data infrastructure.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2019
- Bibcode:
- 2019AGUFMIN32A..03R
- Keywords:
-
- 1908 Cyberinfrastructure;
- INFORMATICS;
- 1910 Data assimilation;
- integration and fusion;
- INFORMATICS;
- 1936 Interoperability;
- INFORMATICS;
- 1974 Social networks;
- INFORMATICS