Distributed-memory $\mathcal{H}$-matrix Algebra I: Data Distribution and Matrix-vector Multiplication

doi:10.48550/arXiv.2008.12441

Distributed-memory $\mathcal{H}$-matrix Algebra I: Data Distribution and Matrix-vector Multiplication

We introduce a data distribution scheme for $\mathcal{H}$-matrices and a distributed-memory algorithm for $\mathcal{H}$-matrix-vector multiplication. Our data distribution scheme avoids an expensive $\Omega(P^2)$ scheduling procedure used in previous work, where $P$ is the number of processes, while data balancing is well-preserved. Based on the data distribution, our distributed-memory algorithm evenly distributes all computations among $P$ processes and adopts a novel tree-communication algorithm to reduce the latency cost. The overall complexity of our algorithm is $O\Big(\frac{N \log N}{P} + \alpha \log P + \beta \log^2 P \Big)$ for $\mathcal{H}$-matrices under weak admissibility condition, where $N$ is the matrix size, $\alpha$ denotes the latency, and $\beta$ denotes the inverse bandwidth. Numerically, our algorithm is applied to address both two- and three-dimensional problems of various sizes among various numbers of processes. On thousands of processes, good parallel efficiency is still observed.

Publication:

arXiv e-prints

Pub Date:

August 2020

DOI:

10.48550/arXiv.2008.12441

arXiv:

arXiv:2008.12441

Bibcode:

2020arXiv200812441L

Keywords:

Mathematics - Numerical Analysis;
Computer Science - Distributed;
Parallel;
and Cluster Computing;
65F99;
65Y05

NASA/ADS

Distributed-memory $\mathcal{H}$-matrix Algebra I: Data Distribution and Matrix-vector Multiplication

Abstract