Science, containerized: Integrating provenance and compute environments with the Whole Tale
Abstract
Reproducibility is a top concern across the sciences and work on provenance standards and tooling is central to this concern. In the computational sciences, provenance describes the origin and processing history of data and workflow products, providing transparency and facilitating reproducibility of computational studies. The availability of workflow systems (providing prospective provenance) and ontologies (providing semantics) can further enhance provenance information. However, existing provenance tools may not capture sufficient information about the computational environment under which the original research was produced, leaving it up to the user to set up a computational environment to reproduce the result. In practice, this is a daunting task and is a consistent point of failure. Recent developments in computational reproducibility such the popularization of containers for bootstrapping computational environments offer a critical linkage between existing provenance tooling and computational environments. While seeing increased adoption across the sciences, container technology may not be approachable to the majority of scientists and does not, by itself, capture sufficient provenance information for reproducibility. The Whole Tale project bridges this gap by enabling tight integration between analyses and the generation of provenance information and an executable description of the computational environment. Users interact with Whole Tale through the their web browsers to create Tales, which are comprised of input data, scripts, output data, detailed provenance information, and a description of the computational environment. Tales can be be published outside of Whole Tale as first-class, citable research products which can be run standalone or imported back into Whole Tale for another user to investigate and possibly extend. Whole Tale assists in capturing provenance by providing a natural, web-based user interface for collecting additional provenance, requiring minimal pre-existing knowledge on the part of the user. By tightly integrating provenance tooling with a description of the computational environment through a unified, convenient web interface, Whole Tale signifies a substantial leap forward for reproducibility.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2018
- Bibcode:
- 2018AGUFMIN53A..02M
- Keywords:
-
- 1908 Cyberinfrastructure;
- INFORMATICSDE: 1916 Data and information discovery;
- INFORMATICSDE: 1948 Metadata: Provenance;
- INFORMATICSDE: 1976 Software tools and services;
- INFORMATICS