DataFed: Towards Reproducible Research via Federated Data Management
Abstract
The increasingly collaborative, globalized nature of scientific research combined with the need to share data and the explosion in data volumes present an urgent need for a scientific data management system (SDMS). An SDMS presents a logical and holistic view of data that greatly simplifies and empowers data organization, curation, searching, sharing, dissemination, etc. We present DataFed -- a lightweight, distributed SDMS that spans a federation of storage systems within a loosely-coupled network of scientific facilities. Unlike existing SDMS offerings, DataFed uses high-performance and scalable user management and data transfer technologies that simplify deployment, maintenance, and expansion of DataFed. DataFed provides web-based and command-line interfaces to manage data and integrate with complex scientific workflows. DataFed represents a step towards reproducible scientific research by enabling reliable staging of the correct data at the desired environment.
- Publication:
-
arXiv e-prints
- Pub Date:
- April 2020
- DOI:
- 10.48550/arXiv.2004.03710
- arXiv:
- arXiv:2004.03710
- Bibcode:
- 2020arXiv200403710S
- Keywords:
-
- Computer Science - Databases;
- Computer Science - Computers and Society
- E-Print:
- Part of conference proceedings at the 6th Annual Conference on Computational Science &