Community-managed Data Sharing, Curation, and Publication: SEN on SEAD
Abstract
While data publication in support of reuse and scientific reproducibility is increasingly being recognized as a key aspect of modern research practice, best practices are still to be developed at the level of scientific communities. Often, such practices are discussed in the abstract - as community standards for data plans or as requirements for yet-to-be-built software - with no clear path to community adoption. In contrast, the Sediment Experimentalist Network, supported through the National Science Foundation's (NSF) EarthCube initiative, has encouraged an iterative, practice-based approach within its community that has resulted in the publication of dozens of datasets, comprised of millions of files totaling more than 4 TB in size, and the documentation of more than 100 experimental procedures, instruments, and facilities, by multiple research teams. A key element of SEN's approach has been to leverage cloud-based data services that provide robust core capabilities with community-based management and customization capabilities. These services - data sharing, curation, and publication services developed through the NSF-supported Sustainable Environment - Actionable Data (SEAD) project and the wiki-based SEN Knowledge Base (KB) - have allowed the SEN team to ground discussions in reality and leverage the practical questions arising as researchers publish data to drive discussion and evolve towards better practices. In this presentation we summarize how SEN interacts with researchers, the best practices that have been developed, and the capabilities of SEAD and the SEN KB that support them. We also describe issues that have arisen in the community - related, for example, to recommended and required metadata, individual, project and community branding, and data version and derivation relationships - and describe how SEN's outreach activities, collaboration with the SEAD team, and the flexible design of the data services themselves have, in combination, been able to provide rapid incremental solutions to support researchers needs while also helping the community align with broader semantic and data publication standards. We conclude with thoughts on how this approach could be applied in other communities as a way to drive progress towards data reuse and reproducible research.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2017
- Bibcode:
- 2017AGUFMED32B..01M
- Keywords:
-
- 0850 Geoscience education research;
- EDUCATION;
- 1908 Cyberinfrastructure;
- INFORMATICS;
- 1912 Data management;
- preservation;
- rescue;
- INFORMATICS;
- 1916 Data and information discovery;
- INFORMATICS