Optimising GPGPU Execution Through Runtime Micro-Architecture Parameter Analysis

doi:10.48550/arXiv.2407.11999

Optimising GPGPU Execution Through Runtime Micro-Architecture Parameter Analysis

GPGPU execution analysis has always been tied to closed-source, proprietary benchmarking tools that provide high-level, non-exhaustive, and/or statistical information, preventing a thorough understanding of bottlenecks and optimization possibilities. Open-source hardware platforms offer opportunities to overcome such limits and co-optimize the full {hardware-mapping-algorithm} compute stack. Yet, so far, this has remained under-explored. In this work, we exploit micro-architecture parameter analysis to develop a hardware-aware, runtime mapping technique for OpenCL kernels on the open Vortex RISC-V GPGPU. Our method is based on trace observations and targets optimal hardware resource utilization to achieve superior performance and flexibility compared to hardware-agnostic mapping approaches. The technique was validated on different architectural GPU configurations across several OpenCL kernels. Overall, our approach significantly enhances the performance of the open-source Vortex GPGPU, contributing to unlocking its potential and usability.

Publication:

arXiv e-prints

Pub Date:

June 2024

DOI:

10.48550/arXiv.2407.11999

arXiv:

arXiv:2407.11999

Bibcode:

2024arXiv240711999S

Keywords:

Computer Science - Hardware Architecture

E-Print:

2023 IEEE International Symposium on Workload Characterization (IISWC)

NASA/ADS

Optimising GPGPU Execution Through Runtime Micro-Architecture Parameter Analysis

Abstract