Performance Optimisation of Smoothed Particle Hydrodynamics Algorithms for Multi/Many-Core Architectures
Abstract
We describe a strategy for code modernisation of Gadget, a widely used community code for computational astrophysics. The focus of this work is on node-level performance optimisation, targeting current multi/many-core IntelR architectures. We identify and isolate a sample code kernel, which is representative of a typical Smoothed Particle Hydrodynamics (SPH) algorithm. The code modifications include threading parallelism optimisation, change of the data layout into Structure of Arrays (SoA), auto-vectorisation and algorithmic improvements in the particle sorting. We obtain shorter execution time and improved threading scalability both on Intel XeonR ($2.6 \times$ on Ivy Bridge) and Xeon PhiTM ($13.7 \times$ on Knights Corner) systems. First few tests of the optimised code result in $19.1 \times$ faster execution on second generation Xeon Phi (Knights Landing), thus demonstrating the portability of the devised optimisation solutions to upcoming architectures.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2016
- DOI:
- 10.48550/arXiv.1612.06090
- arXiv:
- arXiv:1612.06090
- Bibcode:
- 2016arXiv161206090B
- Keywords:
-
- Computer Science - Distributed;
- Parallel;
- and Cluster Computing;
- Astrophysics - Instrumentation and Methods for Astrophysics;
- Physics - Computational Physics
- E-Print:
- 8 pages, 2 columns, 4 figures, accepted as paper at HPCS Proceedings 2017, IEEE XPLORE