LLMR: Knowledge Distillation with a Large Language Model-Induced Reward

doi:10.48550/arXiv.2409.12500

LLMR: Knowledge Distillation with a Large Language Model-Induced Reward

Large language models have become increasingly popular and demonstrated remarkable performance in various natural language processing (NLP) tasks. However, these models are typically computationally expensive and difficult to be deployed in resource-constrained environments. In this paper, we propose LLMR, a novel knowledge distillation (KD) method based on a reward function induced from large language models. We conducted experiments on multiple datasets in the dialogue generation and summarization tasks. Empirical results demonstrate that our LLMR approach consistently outperforms traditional KD methods in different tasks and datasets.

Publication:

arXiv e-prints

Pub Date:

September 2024

DOI:

10.48550/arXiv.2409.12500

arXiv:

arXiv:2409.12500

Bibcode:

2024arXiv240912500L

Keywords:

Computer Science - Computation and Language;
Computer Science - Artificial Intelligence

E-Print:

Accepted by LERC COLING 2024

ADS

LLMR: Knowledge Distillation with a Large Language Model-Induced Reward

Abstract