Graph Convolutions Enrich the Self-Attention in Transformers!

doi:10.48550/arXiv.2312.04234

Graph Convolutions Enrich the Self-Attention in Transformers!

Transformers, renowned for their self-attention mechanism, have achieved state-of-the-art performance across various tasks in natural language processing, computer vision, time-series modeling, etc. However, one of the challenges with deep Transformer models is the oversmoothing problem, where representations across layers converge to indistinguishable values, leading to significant performance degradation. We interpret the original self-attention as a simple graph filter and redesign it from a graph signal processing (GSP) perspective. We propose a graph-filter-based self-attention (GFSA) to learn a general yet effective one, whose complexity, however, is slightly larger than that of the original self-attention mechanism. We demonstrate that GFSA improves the performance of Transformers in various fields, including computer vision, natural language processing, graph-level tasks, speech recognition, and code classification.

Publication:

arXiv e-prints

Pub Date:

December 2023

DOI:

10.48550/arXiv.2312.04234

arXiv:

arXiv:2312.04234

Bibcode:

2023arXiv231204234C

Keywords:

Computer Science - Machine Learning;
Computer Science - Artificial Intelligence

E-Print:

Accepted to NeurIPS 2024. Jeongwhan Choi and Hyowon Wi are co-first authors with equal contributions

NASA/ADS

Graph Convolutions Enrich the Self-Attention in Transformers!

Abstract