Serpent: Scalable and Efficient Image Restoration via Multi-scale Structured State Space Models
Abstract
The landscape of computational building blocks of efficient image restoration architectures is dominated by a combination of convolutional processing and various attention mechanisms. However, convolutional filters, while efficient, are inherently local and therefore struggle with modeling long-range dependencies in images. In contrast, attention excels at capturing global interactions between arbitrary image regions, but suffers from a quadratic cost in image dimension. In this work, we propose Serpent, an efficient architecture for high-resolution image restoration that combines recent advances in state space models (SSMs) with multi-scale signal processing in its core computational block. SSMs, originally introduced for sequence modeling, can maintain a global receptive field with a favorable linear scaling in input size. We propose a novel hierarchical architecture inspired by traditional signal processing principles, that converts the input image into a collection of sequences and processes them in a multi-scale fashion. Our experimental results demonstrate that Serpent can achieve reconstruction quality on par with state-of-the-art techniques, while requiring orders of magnitude less compute (up to $150$ fold reduction in FLOPS) and a factor of up to $5\times$ less GPU memory while maintaining a compact model size. The efficiency gains achieved by Serpent are especially notable at high image resolutions.
- Publication:
-
arXiv e-prints
- Pub Date:
- March 2024
- DOI:
- arXiv:
- arXiv:2403.17902
- Bibcode:
- 2024arXiv240317902S
- Keywords:
-
- Electrical Engineering and Systems Science - Image and Video Processing;
- Computer Science - Computer Vision and Pattern Recognition;
- Computer Science - Machine Learning;
- I.4.4;
- I.4.5
- E-Print:
- 12 pages, 7 figures, under review