Quantization Variation: A New Perspective on Training Transformers with Low-Bit Precision

doi:10.48550/arXiv.2307.00331

Quantization Variation: A New Perspective on Training Transformers with Low-Bit Precision

Despite the outstanding performance of transformers in both language and vision tasks, the expanding computation and model size have increased the demand for efficient deployment. To address the heavy computation and parameter drawbacks, quantization is frequently studied in the community as a representative model compression technique and has seen extensive use on ConvNets. However, due to the unique properties of transformers, the low-bit quantization applications are still limited and underexplored. In this paper, we identify the difficulty of transformer low-bit quantization-aware training on its unique variation behaviors, which significantly differ from ConvNets. Based on comprehensive quantitative analysis, we observe variation in three hierarchies: various module quantization sensitivities, outliers in static weight and activation distribution, and oscillation in dynamic parameter fluctuations. These variations of transformers bring instability to the quantization-aware training (QAT) and negatively influence the performance. We explore the best practices to alleviate the variation's influence during low-bit transformer QAT and propose a variation-aware quantization scheme for both vision and language transformers. We extensively verify and show our scheme can alleviate the variation and improve the performance of transformers across various models and tasks. Our solution substantially improves the 2-bit Swin-T and binary BERT-base, achieving a 3.35% and 1.4% accuracy improvement over previous state-of-the-art methods on ImageNet-1K and GLUE. Codes and models are available at https://github.com/HuangOwen/Quantization-Variation.

Publication:

arXiv e-prints

Pub Date:

July 2023

DOI:

10.48550/arXiv.2307.00331

arXiv:

arXiv:2307.00331

Bibcode:

2023arXiv230700331H

Keywords:

Computer Science - Machine Learning;
Computer Science - Artificial Intelligence;
Computer Science - Computer Vision and Pattern Recognition

E-Print:

Accepted by TMLR, Code: https://github.com/HuangOwen/Quantization-Variation

NASA/ADS

Quantization Variation: A New Perspective on Training Transformers with Low-Bit Precision

Abstract