A GPU Register File using Static Data Compression
Abstract
GPUs rely on large register files to unlock thread-level parallelism for high throughput. Unfortunately, large register files are power hungry, making it important to seek for new approaches to improve their utilization. This paper introduces a new register file organization for efficient register-packing of narrow integer and floating-point operands designed to leverage on advances in static analysis. We show that the hardware/software co-designed register file organization yields a performance improvement of up to 79%, and 18.6%, on average, at a modest output-quality degradation.
- Publication:
-
arXiv e-prints
- Pub Date:
- June 2020
- DOI:
- 10.48550/arXiv.2006.05693
- arXiv:
- arXiv:2006.05693
- Bibcode:
- 2020arXiv200605693A
- Keywords:
-
- Computer Science - Hardware Architecture
- E-Print:
- Accepted to ICPP'20