GRACOS: Scalable and Load Balanced P3M Cosmological N-body Code
Abstract
We present a parallel implementation of the particle-particle/particle-mesh (P3M) algorithm for distributed memory clusters. The GRACOS (GRAvitational COSmology) code uses a hybrid method for both computation and domain decomposition. Long-range forces are computed using a Fourier transform gravity solver on a regular mesh; the mesh is distributed across parallel processes using a static one-dimensional slab domain decomposition. Short-range forces are computed by direct summation of close pairs; particles are distributed using a dynamic domain decomposition based on a space-filling Hilbert curve. A nearly-optimal method was devised to dynamically repartition the particle distribution so as to maintain load balance even for extremely inhomogeneous mass distributions. Tests using $800^3$ simulations on a 40-processor beowulf cluster showed good load balance and scalability up to 80 processes. We discuss the limits on scalability imposed by communication and extreme clustering and suggest how they may be removed by extending our algorithm to include adaptive mesh refinement.
- Publication:
-
arXiv e-prints
- Pub Date:
- May 2005
- DOI:
- 10.48550/arXiv.astro-ph/0505087
- arXiv:
- arXiv:astro-ph/0505087
- Bibcode:
- 2005astro.ph..5087S
- Keywords:
-
- Astrophysics
- E-Print:
- to be submitted to ApJ. S