MMA-Diffusion: MultiModal Attack on Diffusion Models
Abstract
In recent years, Text-to-Image (T2I) models have seen remarkable advancements, gaining widespread adoption. However, this progress has inadvertently opened avenues for potential misuse, particularly in generating inappropriate or Not-Safe-For-Work (NSFW) content. Our work introduces MMA-Diffusion, a framework that presents a significant and realistic threat to the security of T2I models by effectively circumventing current defensive measures in both open-source models and commercial online services. Unlike previous approaches, MMA-Diffusion leverages both textual and visual modalities to bypass safeguards like prompt filters and post-hoc safety checkers, thus exposing and highlighting the vulnerabilities in existing defense mechanisms.
- Publication:
-
arXiv e-prints
- Pub Date:
- November 2023
- DOI:
- 10.48550/arXiv.2311.17516
- arXiv:
- arXiv:2311.17516
- Bibcode:
- 2023arXiv231117516Y
- Keywords:
-
- Computer Science - Cryptography and Security;
- Computer Science - Computer Vision and Pattern Recognition
- E-Print:
- CVPR 2024. Our codes and benchmarks are available at https://github.com/cure-lab/MMA-Diffusion