Boosting Research Reproducibility: Managing High Performance Model Simulation Workflow Using Jupyter Notebook
Abstract
Jupyter, an open-source, interactive, web-based notebook, has become an increasingly popular tool to conduct scientific research. It combines software code, narrative text and computational outputs in a single document. Researchers with the Jupyter notebook can easily rerun and reproduce the previous results. However, this can be challenging when you have a dozen Jupyter notebooks and each notebook has its own functionality, which is common for a relatively complex task. In this tutorial, we will demonstrate how to use Jupyter notebooks to manage the workflow of high performance model simulation (i.e. running parallel simulations on supercomputing clusters). For illustrative purpose, we use PFLOTRAN, a massively parallel subsurface simulator, as an example to show the workflow for setting up a groundwater model, running the simulation, and post-processing the model outputs. We aim to provide an interactive documentation for people who have little or no experience running computational models to be able to follow and reproduce our work. Most importantly, this workflow would apply to other computational tasks that involve many components. In the end, we will cover basics on how to use version control tools like git to keep track of the changes and share the notebooks on Github. Some tips will be shared on running Jupyter notebooks on free, cloud server like Binder and Google Colab, which makes running notebooks much easier and faster while sharing reproducible results.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2019
- Bibcode:
- 2019AGUFMED53F0901S
- Keywords:
-
- 0820 Curriculum and laboratory design;
- EDUCATION;
- 0825 Teaching methods;
- EDUCATION;
- 9810 New fields (not classifiable under other headings);
- GENERAL OR MISCELLANEOUS;
- 9820 Techniques applicable in three or more fields;
- GENERAL OR MISCELLANEOUS