Learning Spatially-Adaptive Squeeze-Excitation Networks for Image Synthesis and Image Recognition

doi:10.48550/arXiv.2112.14804

Learning Spatially-Adaptive Squeeze-Excitation Networks for Image Synthesis and Image Recognition

Learning light-weight yet expressive deep networks in both image synthesis and image recognition remains a challenging problem. Inspired by a more recent observation that it is the data-specificity that makes the multi-head self-attention (MHSA) in the Transformer model so powerful, this paper proposes to extend the widely adopted light-weight Squeeze-Excitation (SE) module to be spatially-adaptive to reinforce its data specificity, as a convolutional alternative of the MHSA, while retaining the efficiency of SE and the inductive basis of convolution. It presents two designs of spatially-adaptive squeeze-excitation (SASE) modules for image synthesis and image recognition respectively. For image synthesis tasks, the proposed SASE is tested in both low-shot and one-shot learning tasks. It shows better performance than prior arts. For image recognition tasks, the proposed SASE is used as a drop-in replacement for convolution layers in ResNets and achieves much better accuracy than the vanilla ResNets, and slightly better than the MHSA counterparts such as the Swin-Transformer and Pyramid-Transformer in the ImageNet-1000 dataset, with significantly smaller models.

Publication:

arXiv e-prints

Pub Date:

December 2021

DOI:

10.48550/arXiv.2112.14804

arXiv:

arXiv:2112.14804

Bibcode:

2021arXiv211214804S

Keywords:

Computer Science - Computer Vision and Pattern Recognition;
Computer Science - Machine Learning

NASA/ADS

Learning Spatially-Adaptive Squeeze-Excitation Networks for Image Synthesis and Image Recognition

Abstract