Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning
Abstract
Recent pretrained language models extend from millions to billions of parameters. Thus the need to fine-tune an extremely large pretrained model with a limited training corpus arises in various downstream tasks. In this paper, we propose a straightforward yet effective fine-tuning technique, Child-Tuning, which updates a subset of parameters (called child network) of large pretrained models via strategically masking out the gradients of the non-child network during the backward process. Experiments on various downstream tasks in GLUE benchmark show that Child-Tuning consistently outperforms the vanilla fine-tuning by 1.5~8.6 average score among four different pretrained models, and surpasses the prior fine-tuning techniques by 0.6~1.3 points. Furthermore, empirical results on domain transfer and task transfer show that Child-Tuning can obtain better generalization performance by large margins.
- Publication:
-
arXiv e-prints
- Pub Date:
- September 2021
- DOI:
- 10.48550/arXiv.2109.05687
- arXiv:
- arXiv:2109.05687
- Bibcode:
- 2021arXiv210905687X
- Keywords:
-
- Computer Science - Computation and Language;
- Computer Science - Artificial Intelligence
- E-Print:
- Accepted as a long paper to EMNLP 2021 Main Conference