Triangel: A High-Performance, Accurate, Timely On-Chip Temporal Prefetcher
Abstract
Temporal prefetching, where correlated pairs of addresses are logged and replayed on repeat accesses, has recently become viable in commercial designs. Arm's latest processors include Correlating Miss Chaining prefetchers, which store such patterns in a partition of the on-chip cache. However, the state-of-the-art on-chip temporal prefetcher in the literature, Triage, features some design inconsistencies and inaccuracies that pose challenges for practical implementation. We first examine and design fixes for these inconsistencies to produce an implementable baseline. We then introduce Triangel, a prefetcher that extends Triage with novel sampling-based methodologies to allow it to be aggressive and timely when the prefetcher is able to handle observed long-term patterns, and to avoid inaccurate prefetches when less able to do so. Triangel gives a 26.4% speedup compared to a baseline system with a conventional stride prefetcher alone, compared with 9.3% for Triage at degree 1 and 14.2% at degree 4. At the same time Triangel only increases memory traffic by 10% relative to baseline, versus 28.5% for Triage.
- Publication:
-
arXiv e-prints
- Pub Date:
- June 2024
- DOI:
- arXiv:
- arXiv:2406.10627
- Bibcode:
- 2024arXiv240610627A
- Keywords:
-
- Computer Science - Hardware Architecture
- E-Print:
- To be published at ISCA 2024