Summon a Demon and Bind it: A Grounded Theory of LLM Red Teaming in the Wild

doi:10.48550/arXiv.2311.06237

Summon a Demon and Bind it: A Grounded Theory of LLM Red Teaming in the Wild

Engaging in the deliberate generation of abnormal outputs from large language models (LLMs) by attacking them is a novel human activity. This paper presents a thorough exposition of how and why people perform such attacks. Using a formal qualitative methodology, we interviewed dozens of practitioners from a broad range of backgrounds, all contributors to this novel work of attempting to cause LLMs to fail. We relate and connect this activity between its practitioners' motivations and goals; the strategies and techniques they deploy; and the crucial role the community plays. As a result, this paper presents a grounded theory of how and why people attack large language models: LLM red teaming in the wild.

Publication:

arXiv e-prints

Pub Date:

November 2023

DOI:

10.48550/arXiv.2311.06237

arXiv:

arXiv:2311.06237

Bibcode:

2023arXiv231106237I

Keywords:

Computer Science - Computation and Language;
Computer Science - Cryptography and Security;
Computer Science - Human-Computer Interaction

NASA/ADS

Summon a Demon and Bind it: A Grounded Theory of LLM Red Teaming in the Wild

Abstract