Mechanism to Mitigate AVX-Induced Frequency Reduction
Abstract
Modern Intel CPUs reduce their frequency when executing wide vector operations (AVX2 and AVX-512 instructions), as these instructions increase power consumption. The frequency is only increased again two milliseconds after the last code section containing such instructions has been executed in order to prevent excessive numbers of frequency changes. Due to this delay, intermittent use of wide vector operations can slow down the rest of the system significantly. For example, previous work has shown the performance of web servers to be reduced by up to 10% if the SSL library uses AVX-512 vector instructions. These performance variations are hard to predict during software development as the performance impact of vectorization depends on the specific workload. We describe a mechanism to reduce the slowdown caused by wide vector instructions without requiring extensive changes to existing software. Our design allows the developer to mark problematic AVX code regions. The scheduler then restricts execution of this code to a subset of the cores so that only these cores' frequency is affected. Threads are automatically migrated to a suitable core whenever necessary. We identify a suitable load balancing policy to ensure good utilization of all available cores. Our approach is able to reduce the performance variability caused by AVX2 and AVX-512 instructions by over 70%.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2018
- DOI:
- arXiv:
- arXiv:1901.04982
- Bibcode:
- 2019arXiv190104982G
- Keywords:
-
- Computer Science - Distributed;
- Parallel;
- and Cluster Computing;
- Computer Science - Performance
- E-Print:
- 12 pages, 7 figures