Teaching Autoregressive Language Models Complex Tasks By Demonstration

doi:10.48550/arXiv.2109.02102

Teaching Autoregressive Language Models Complex Tasks By Demonstration

Recchia, Gabriel

This paper demonstrates that by fine-tuning an autoregressive language model (GPT-Neo) on appropriately structured step-by-step demonstrations, it is possible to teach it to execute a mathematical task that has previously proved difficult for Transformers - longhand modulo operations - with a relatively small number of examples. Specifically, we fine-tune GPT-Neo to solve the numbers__div_remainder task from the DeepMind Mathematics Dataset; Saxton et al. (arXiv:1904.01557) reported below 40% accuracy on this task with 2 million training examples. We show that after fine-tuning on 200 appropriately structured demonstrations of solving long division problems and reporting the remainders, the smallest available GPT-Neo model achieves over 80% accuracy. This is achieved by constructing an appropriate dataset for fine-tuning, with no changes to the learning algorithm. These results suggest that fine-tuning autoregressive language models on small sets of well-crafted demonstrations may be a useful paradigm for enabling individuals without training in machine learning to coax such models to perform some kinds of complex multi-step tasks.

Publication:

arXiv e-prints

Pub Date:

September 2021

DOI:

10.48550/arXiv.2109.02102

arXiv:

arXiv:2109.02102

Bibcode:

2021arXiv210902102R

Keywords:

Computer Science - Computation and Language;
Computer Science - Artificial Intelligence;
I.2.0;
I.2.6

E-Print:

Corrected typo in Figure 2. Updated two citations to adhere to the format preferred by the cited authors

NASA/ADS

Teaching Autoregressive Language Models Complex Tasks By Demonstration

Abstract