An Optimized Sparse Approximate Matrix Multiply for Matrices with Decay

doi:10.1137/120870761

An Optimized Sparse Approximate Matrix Multiply for Matrices with Decay

We present an optimized single-precision implementation of the Sparse Approximate Matrix Multiply (\SpAMM{}) [M. Challacombe and N. Bock, arXiv {\bf 1011.3534} (2010)], a fast algorithm for matrix-matrix multiplication for matrices with decay that achieves an $\mathcal{O} (n \log n)$ computational complexity with respect to matrix dimension $n$. We find that the max norm of the error achieved with a \SpAMM{} tolerance below $2 \times 10^{-8}$ is lower than that of the single-precision {\tt SGEMM} for dense quantum chemical matrices, while outperforming {\tt SGEMM} with a cross-over already for small matrices ($n \sim 1000$). Relative to naive implementations of \SpAMM{} using Intel's Math Kernel Library ({\tt MKL}) or AMD's Core Math Library ({\tt ACML}), our optimized version is found to be significantly faster. Detailed performance comparisons are made for quantum chemical matrices with differently structured sub-blocks. Finally, we discuss the potential of improved hardware prefetch to yield 2--3x speedups.

Publication:

SIAM Journal on Scientific Computing

Pub Date:

January 2013

DOI:

10.1137/120870761

arXiv:

arXiv:1203.1692

Bibcode:

2013SJSC...35C..72B

Keywords:

Computer Science - Numerical Analysis;
Computer Science - Data Structures and Algorithms;
Computer Science - Mathematical Software;
65F15;
65-04;
65Z15;
15-04

NASA/ADS

An Optimized Sparse Approximate Matrix Multiply for Matrices with Decay

Abstract