BAMBA: A Bimodal Adversarial Multi-Round Black-Box Jailbreak Attacker for LVLMs
Abstract
LVLMs are widely used but vulnerable to illegal or unethical responses under jailbreak attacks. To ensure their responsible deployment in real-world applications, it is essential to understand their vulnerabilities. There are four main issues in current work: single-round attack limitation, insufficient dual-modal synergy, poor transferability to black-box models, and reliance on prompt engineering. To address these limitations, we propose BAMBA, a bimodal adversarial multi-round black-box jailbreak attacker for LVLMs. We first use an image optimizer to learn malicious features from a harmful corpus, then deepen these features through a bimodal optimizer through text-image interaction, generating adversarial text and image for jailbreak. Experiments on various LVLMs and datasets demonstrate that BAMBA outperforms other baselines.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2024
- DOI:
- arXiv:
- arXiv:2412.05892
- Bibcode:
- 2024arXiv241205892C
- Keywords:
-
- Computer Science - Cryptography and Security;
- Computer Science - Artificial Intelligence
- E-Print:
- A Bimodal Adversarial Multi-Round Black-Box Jailbreak Attacker for LVLMs