ACCO: Automated Causal CNN Scheduling Optimizer for Real-Time Edge Accelerators
Abstract
Spatio-Temporal Convolutional Neural Networks (ST-CNN) allow extending CNN capabilities from image processing to consecutive temporal-pattern recognition. Generally, state-of-the-art (SotA) ST-CNNs inflate the feature maps and weights from well-known CNN backbones to represent the additional time dimension. However, edge computing applications would suffer tremendously from such large computation or memory overhead. Fortunately, the overlapping nature of ST-CNN enables various optimizations, such as the dilated causal convolution structure and Depth-First (DF) layer fusion to reuse the computation between time steps and CNN sliding windows, respectively. Yet, no hardware-aware approach has been proposed that jointly explores the optimal strategy from a scheduling as well as a hardware point of view. To this end, we present ACCO, an automated optimizer that explores efficient Causal CNN transformation and DF scheduling for ST-CNNs on edge hardware accelerators. By cost-modeling the computation and data movement on the accelerator architecture, ACCO automatically selects the best scheduling strategy for the given hardware-algorithm target. Compared to the fixed dilated causal structure, ST-CNNs with ACCO reach an ~8.4x better Energy-Delay-Product. Meanwhile, ACCO improves ~20% in layer-fusion optimals compared to the SotA DF exploration toolchain. When jointly optimizing ST-CNN on the temporal and spatial dimension, ACCO's scheduling outcomes are on average 19x faster and 37x more energy-efficient than spatial DF schemes.
- Publication:
-
arXiv e-prints
- Pub Date:
- June 2024
- DOI:
- 10.48550/arXiv.2406.07161
- arXiv:
- arXiv:2406.07161
- Bibcode:
- 2024arXiv240607161Y
- Keywords:
-
- Electrical Engineering and Systems Science - Signal Processing
- E-Print:
- 2023 IEEE 41st International Conference on Computer Design (ICCD), Washington, DC, USA, 2023, pp. 391-398