Batched DGEMMs for scientific codes running on long vector architectures
Abstract
In this work, we evaluate the performance of SeisSol, a simulator of seismic wave phenomena and earthquake dynamics, on a RISC-V-based system utilizing a vector processing unit. We focus on GEMM libraries and address their limited ability to leverage long vector architectures by developing a batched DGEMM library in plain C. This library achieves speedups ranging from approximately 3.5x to 32.6x compared to the reference implementation. We then integrate the batched approach into the SeisSol application, ensuring portability across different CPU architectures. Lastly, we demonstrate that our implementation is portable to an Intel CPU, resulting in improved execution times in most cases.
- Publication:
-
arXiv e-prints
- Pub Date:
- January 2025
- DOI:
- arXiv:
- arXiv:2501.06175
- Bibcode:
- 2025arXiv250106175B
- Keywords:
-
- Computer Science - Distributed;
- Parallel;
- and Cluster Computing
- E-Print:
- Accepted at the First PPAM Workshop on RISC-V (PPAM24)