Bitune: Bidirectional Instruction-Tuning

doi:10.48550/arXiv.2405.14862

Bitune: Bidirectional Instruction-Tuning

We introduce Bitune, a method that improves instruction-tuning of pretrained decoder-only large language models, leading to consistent gains on downstream tasks. Bitune applies both causal and bidirectional attention to the prompt, to obtain a better representation of the query or instruction. We realize this by introducing two sets of parameters, for which we apply parameter-efficient finetuning techniques. These causal and bidirectional features are then combined into a weighted average with trainable coefficients, which is subsequently used to generate new tokens. We demonstrate significant improvements in zero-shot performance on commonsense reasoning, arithmetic, and language understanding tasks, while extensive ablation studies validate the role of each component and demonstrate the method's agnosticism to different PEFT techniques.

Publication:

arXiv e-prints

Pub Date:

May 2024

DOI:

10.48550/arXiv.2405.14862

arXiv:

arXiv:2405.14862

Bibcode:

2024arXiv240514862K

Keywords:

Computer Science - Computation and Language

NASA/ADS

Bitune: Bidirectional Instruction-Tuning

Abstract