CREST: An Efficient Conjointly-trained Spike-driven Framework for Event-based Object Detection Exploiting Spatiotemporal Dynamics

doi:10.48550/arXiv.2412.12525

CREST: An Efficient Conjointly-trained Spike-driven Framework for Event-based Object Detection Exploiting Spatiotemporal Dynamics

Event-based cameras feature high temporal resolution, wide dynamic range, and low power consumption, which is ideal for high-speed and low-light object detection. Spiking neural networks (SNNs) are promising for event-based object recognition and detection due to their spiking nature but lack efficient training methods, leading to gradient vanishing and high computational complexity, especially in deep SNNs. Additionally, existing SNN frameworks often fail to effectively handle multi-scale spatiotemporal features, leading to increased data redundancy and reduced accuracy. To address these issues, we propose CREST, a novel conjointly-trained spike-driven framework to exploit spatiotemporal dynamics in event-based object detection. We introduce the conjoint learning rule to accelerate SNN learning and alleviate gradient vanishing. It also supports dual operation modes for efficient and flexible implementation on different hardware types. Additionally, CREST features a fully spike-driven framework with a multi-scale spatiotemporal event integrator (MESTOR) and a spatiotemporal-IoU (ST-IoU) loss. Our approach achieves superior object recognition & detection performance and up to 100X energy efficiency compared with state-of-the-art SNN algorithms on three datasets, providing an efficient solution for event-based object detection algorithms suitable for SNN hardware implementation.

Publication:

arXiv e-prints

Pub Date:

December 2024

DOI:

10.48550/arXiv.2412.12525

arXiv:

arXiv:2412.12525

Bibcode:

2024arXiv241212525M

Keywords:

Computer Science - Computer Vision and Pattern Recognition;
Computer Science - Artificial Intelligence

E-Print:

Accepted by AAAI 2025

ADS

CREST: An Efficient Conjointly-trained Spike-driven Framework for Event-based Object Detection Exploiting Spatiotemporal Dynamics

Abstract