Turning Your Strength into Watermark: Watermarking Large Language Model via Knowledge Injection
Abstract
Large language models (LLMs) have demonstrated outstanding performance, making them valuable digital assets with significant commercial potential. Unfortunately, the LLM and its API are susceptible to intellectual property theft. Watermarking is a classic solution for copyright verification. However, most recent emerging LLM watermarking methods focus on identifying AI-generated texts rather than watermarking LLM itself. Only a few attempts are based on weight quantification and backdoor watermarking, which are not robust or covert enough, limiting their applicability in practice. To address this issue, we propose a novel watermarking method for LLMs based on knowledge injection and innovatively use knowledge as the watermark carrier. Specifically, in the watermark embedding stage, we first embed the watermarks into the selected knowledge to obtain the watermarked knowledge, subsequently injected into the to-be-protected LLM. In the watermark extraction stage, questions related to the watermarked knowledge are designed, for querying the suspect LLM and extracting the watermarks from its response. The experiments show that the watermark extraction success rate is close to 100% and demonstrate the effectiveness, fidelity, stealthiness, and robustness of our proposed method.
- Publication:
-
arXiv e-prints
- Pub Date:
- November 2023
- DOI:
- arXiv:
- arXiv:2311.09535
- Bibcode:
- 2023arXiv231109535L
- Keywords:
-
- Computer Science - Cryptography and Security