Sharing Science with Journals, Peers, and Skeptics: A Reproducible Approach in the Era of Big Data
Abstract
With ever-increasing processing speed and storage capacity, computing power has catapulted hydrologic modelers into a brave new world. Oftentimes, it is not possible for researchers to easily provide all of the forcing datasets from a modeling experiment to those that desire to test and expand on the final study conclusions. Relatively new journal requirements for data availability have accelerated the need for new ways of packaging and disseminating large datasets. This work will present the pitfalls and accomplishments of our approach to reproducible science for a multi-year modeling study in the Sierra Nevada. The snow model used in the study, iSnobal, is a distributed, physically based energy balance model that requires gridded meteorological forcing inputs. At a 50 m spatial resolution and hourly temporal resolution, the gridded forcing datasets occupy over 3 TB of disk space for the 4-year time period. We show how Docker containers were used to package the vector data from station measurements and to freeze the model and its code dependencies at the time when the original forcing data was produced. Any operating system can then be used to run the same interpolation framework to produce the same forcing grids that were used in our journal publication. This approach is a small step toward making hydrologic science more reproducible and transparent.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2018
- Bibcode:
- 2018AGUFM.C13J1254H
- Keywords:
-
- 0798 Modeling;
- CRYOSPHEREDE: 1805 Computational hydrology;
- HYDROLOGYDE: 1920 Emerging informatics technologies;
- INFORMATICSDE: 1978 Software re-use;
- INFORMATICS