Classifier-free guidance in LLMs Safety

doi:10.48550/arXiv.2412.06846

Classifier-free guidance in LLMs Safety

Smirnov, Roman

The paper describes LLM unlearning without a retaining dataset, using the ORPO reinforcement learning method with inference enhanced by modified classifier-free guidance. Significant improvement in unlearning, without degradation of the model, is achieved through direct training on synthetic replacement data in CFG-aware training regime, with classifier-free guidance applied during the inference. This article is an extended version of the NeurIPS 2024 LLM-PC submission, which was awarded second prize.

Publication:

arXiv e-prints

Pub Date:

December 2024

DOI:

10.48550/arXiv.2412.06846

arXiv:

arXiv:2412.06846

Bibcode:

2024arXiv241206846S

Keywords:

Computer Science - Machine Learning;
Computer Science - Artificial Intelligence

ADS

Classifier-free guidance in LLMs Safety

Abstract