A Transfer Attack to Image Watermarks

doi:10.48550/arXiv.2403.15365

A Transfer Attack to Image Watermarks

Watermark has been widely deployed by industry to detect AI-generated images. The robustness of such watermark-based detector against evasion attacks in the white-box and black-box settings is well understood in the literature. However, the robustness in the no-box setting is much less understood. In this work, we propose a new transfer evasion attack to image watermark in the no-box setting. Our transfer attack adds a perturbation to a watermarked image to evade multiple surrogate watermarking models trained by the attacker itself, and the perturbed watermarked image also evades the target watermarking model. Our major contribution is to show that, both theoretically and empirically, watermark-based AI-generated image detector is not robust to evasion attacks even if the attacker does not have access to the watermarking model nor the detection API.

Publication:

arXiv e-prints

Pub Date:

March 2024

DOI:

10.48550/arXiv.2403.15365

arXiv:

arXiv:2403.15365

Bibcode:

2024arXiv240315365H

Keywords:

Computer Science - Cryptography and Security;
Computer Science - Computation and Language;
Computer Science - Machine Learning

NASA/ADS

A Transfer Attack to Image Watermarks

Abstract