UFO-ViT: High Performance Linear Vision Transformer without Softmax

doi:10.48550/arXiv.2109.14382

UFO-ViT: High Performance Linear Vision Transformer without Softmax

Song, Jeong-geun

Vision transformers have become one of the most important models for computer vision tasks. Although they outperform prior works, they require heavy computational resources on a scale that is quadratic to $N$. This is a major drawback of the traditional self-attention (SA) algorithm. Here, we propose the Unit Force Operated Vision Transformer (UFO-ViT), a novel SA mechanism that has linear complexity. The main approach of this work is to eliminate nonlinearity from the original SA. We factorize the matrix multiplication of the SA mechanism without complicated linear approximation. By modifying only a few lines of code from the original SA, the proposed models outperform most transformer-based models on image classification and dense prediction tasks on most capacity regimes.

Publication:

arXiv e-prints

Pub Date:

September 2021

DOI:

10.48550/arXiv.2109.14382

arXiv:

arXiv:2109.14382

Bibcode:

2021arXiv210914382S

Keywords:

Computer Science - Computer Vision and Pattern Recognition

NASA/ADS

UFO-ViT: High Performance Linear Vision Transformer without Softmax

Abstract