PromptRobust: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts

doi:10.48550/arXiv.2306.04528

PromptRobust: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts

The increasing reliance on Large Language Models (LLMs) across academia and industry necessitates a comprehensive understanding of their robustness to prompts. In response to this vital need, we introduce PromptRobust, a robustness benchmark designed to measure LLMs' resilience to adversarial prompts. This study uses a plethora of adversarial textual attacks targeting prompts across multiple levels: character, word, sentence, and semantic. The adversarial prompts, crafted to mimic plausible user errors like typos or synonyms, aim to evaluate how slight deviations can affect LLM outcomes while maintaining semantic integrity. These prompts are then employed in diverse tasks including sentiment analysis, natural language inference, reading comprehension, machine translation, and math problem-solving. Our study generates 4,788 adversarial prompts, meticulously evaluated over 8 tasks and 13 datasets. Our findings demonstrate that contemporary LLMs are not robust to adversarial prompts. Furthermore, we present a comprehensive analysis to understand the mystery behind prompt robustness and its transferability. We then offer insightful robustness analysis and pragmatic recommendations for prompt composition, beneficial to both researchers and everyday users.

Publication:

arXiv e-prints

Pub Date:

June 2023

DOI:

10.48550/arXiv.2306.04528

arXiv:

arXiv:2306.04528

Bibcode:

2023arXiv230604528Z

Keywords:

Computer Science - Computation and Language;
Computer Science - Cryptography and Security;
Computer Science - Machine Learning

E-Print:

Technical report

ADS

PromptRobust: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts

Abstract