Cost optimized ab initio tensor network state methods: industrial perspectives
Abstract
We introduce efficient solutions to optimize the cost of tree-like tensor network state method calculations when an expensive GPU-accelerated hardware is utilized. By supporting a main powerful compute node with additional auxiliary, but much cheaper nodes to store intermediate, precontracted tensor network scratch data, the IO time can be hidden behind the computation almost entirely without increasing memory peak. Our solution is based on the different bandwidths of the different communication channels, like NVLink, PCIe, InfiniBand and available storage media, which are utilized on different layers of the algorithm. This simple heterogeneous multiNode solution via asynchronous IO operation has the potential to minimize IO overhead, resulting in maximum performance rate for the main compute unit. In addition, we introduce an in-house developed massively parallel protocol to serialize and deserialize block sparse matrices and tensors, reducing data communication time tremendously. Performance profiles are presented for the spin adapted ab initio density matrix renormalization group method for corresponding U(1) bond dimension values up to 15400 on the active compounds of the FeMoco with complete active space (CAS) sizes of up to 113 electrons in 76 orbitals [CAS(113, 76)].
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2024
- DOI:
- arXiv:
- arXiv:2412.04676
- Bibcode:
- 2024arXiv241204676M
- Keywords:
-
- Physics - Computational Physics;
- Condensed Matter - Strongly Correlated Electrons;
- Physics - Chemical Physics
- E-Print:
- 13 pages, 10 figures