LoAS: Fully Temporal-Parallel Dataflow for Dual-Sparse Spiking Neural Networks

doi:10.48550/arXiv.2407.14073

LoAS: Fully Temporal-Parallel Dataflow for Dual-Sparse Spiking Neural Networks

Spiking Neural Networks (SNNs) have gained significant research attention in the last decade due to their potential to drive resource-constrained edge devices. Though existing SNN accelerators offer high efficiency in processing sparse spikes with dense weights, opportunities are less explored in SNNs with sparse weights, i.e., dual-sparsity. In this work, we study the acceleration of dual-sparse SNNs, focusing on their core operation, sparse-matrix-sparse-matrix multiplication (spMspM). We observe that naively running a dual-sparse SNN on existing spMspM accelerators designed for dual-sparse Artificial Neural Networks (ANNs) exhibits sub-optimal efficiency. The main challenge is that processing timesteps, a natural property of SNNs, introduces an extra loop to ANN spMspM, leading to longer latency and more memory traffic. To address the problem, we propose a fully temporal-parallel (FTP) dataflow, which minimizes both data movement across timesteps and the end-to-end latency of dual-sparse SNNs. To maximize the efficiency of FTP dataflow, we propose an FTP-friendly spike compression mechanism that efficiently compresses single-bit spikes and ensures contiguous memory access. We further propose an FTP-friendly inner-join circuit that can lower the cost of the expensive prefix-sum circuits with almost no throughput penalty. All the above techniques for FTP dataflow are encapsulated in LoAS, a Low-latency inference Accelerator for dual-sparse SNNs. With FTP dataflow, compression, and inner-join, running dual-sparse SNN workloads on LoAS demonstrates significant speedup (up to $8.51\times$) and energy reduction (up to $3.68\times$) compared to running it on prior dual-sparse accelerators.

Publication:

arXiv e-prints

Pub Date:

July 2024

DOI:

10.48550/arXiv.2407.14073

arXiv:

arXiv:2407.14073

Bibcode:

2024arXiv240714073Y

Keywords:

Computer Science - Hardware Architecture;
Computer Science - Artificial Intelligence;
Computer Science - Neural and Evolutionary Computing

E-Print:

Accepted to MICRO 2024. Will update with the camera-ready version once ready. (Github: https://github.com/RuokaiYin/LoAS)

NASA/ADS

LoAS: Fully Temporal-Parallel Dataflow for Dual-Sparse Spiking Neural Networks

Abstract