Mapping PetaSHA Applications to TeraGrid Architectures
Abstract
The Southern California Earthquake Center (SCEC) has a science program in developing an integrated cyberfacility - PetaSHA - for executing physics-based seismic hazard analysis (SHA) computations. The NSF has awarded PetaSHA 15 million allocation service units this year on the fastest supercomputers available within the NSF TeraGrid. However, one size does not fit all, a range of systems are needed to support this effort at different stages of the simulations. Enabling PetaSHA simulations on those TeraGrid architectures to solve both dynamic rupture and seismic wave propagation have been a challenge from both hardware and software levels. This is an adaptation procedure to meet specific requirements of each architecture. It is important to determine how fundamental system attributes affect application performance. We present an adaptive approach in our PetaSHA application that enables the simultaneous optimization of both computation and communication at run-time using flexible settings. These techniques optimize initialization, source/media partition and MPI-IO output in different ways to achieve optimal performance on the target machines. The resulting code is a factor of four faster than the orignial version. New MPI-I/O capabilities have been added for the accurate Staggered-Grid Split-Node (SGSN) method for dynamic rupture propagation in the velocity-stress staggered-grid finite difference scheme (Dalguer and Day, JGR, 2007), We use execution workflow across TeraGrid sites for managing the resulting data volumes. Our lessons learned indicate that minimizing time to solution is most critical, in particular when scheduling large scale simulations across supercomputer sites. The TeraShake platform has been ported to multiple architectures including TACC Dell lonestar and Abe, Cray XT3 Bigben and Blue Gene/L. Parallel efficiency of 96% with the PetaSHA application Olsen-AWM has been demonstrated on 40,960 Blue Gene/L processors at IBM TJ Watson Center. Notable accomplishments using the optimized code include the M7.8 ShakeOut rupture scenario, as part of the southern San Andreas Fault evaluation SoSAFE. The ShakeOut simulation domain is the same as used for the SCEC TeraShake simulations (600 km by 300 km by 80 km). However, the higher resolution of 100 m with frequency content up to 1 Hz required 14.4 billion grid points, eight times more than the TeraShake scenarios. The simulation used 2000 TACC Dell linux Lonestar processors and took 56 hours to compute 240 seconds of wave propagation. The pre-processing input partition, as well as post-processing analysis has been performed on the SDSC IBM Datastar p655 and p690. In addition, as part of the SCEC DynaShake computational platform, the SGSN capability was used to model dynamic rupture propagation for the ShakeOut scenario that match the proposed surface slip and size of the event. Mapping applications to different architectures require coordination of many areas of expertise in hardware and application level, an outstanding challenge faced on the current petascale computing effort. We believe our techniques as well as distributed data management through data grids have provided a practical example of how to effectively use multiple compute resources, and our results will benefit other geoscience disciplines as well.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2007
- Bibcode:
- 2007AGUFMIN21B0483C
- Keywords:
-
- 0525 Data management;
- 0545 Modeling (4255);
- 0902 Computational methods: seismic;
- 1706 Computational geophysics;
- 1734 Seismology