Reproducibility and Performance: Why Choose?
Abstract
Research processes often rely on high-performance computing (HPC), but HPC is often seen as antithetical to "reproducibility": one would have to choose between software that achieves high performance, and software that can be deployed in a reproducible fashion. However, by giving up on reproducibility we would give up on verifiability, a foundation of the scientific process. How can we conciliate performance and reproducibility? This article looks at two performance-critical aspects in HPC: message passing (MPI) and CPU micro-architecture tuning. Engineering work that has gone into performance portability has already proved fruitful, but some areas remain unaddressed when it comes to CPU tuning. We propose package multi-versioning, a technique developed for GNU Guix, a tool for reproducible software deployment, and show that it allows us to implement CPU tuning without compromising on reproducibility and provenance tracking.
- Publication:
-
Computing in Science and Engineering
- Pub Date:
- May 2022
- DOI:
- 10.1109/MCSE.2022.3165626
- arXiv:
- arXiv:2203.07953
- Bibcode:
- 2022CSE....24c..77C
- Keywords:
-
- Computer Science - Distributed;
- Parallel;
- and Cluster Computing;
- Computer Science - Software Engineering
- E-Print:
- Computing in Science and Engineering, Institute of Electrical and Electronics Engineers, In press