Generating watermarked adversarial texts

doi:10.1117/1.JEI.32.2.023023

Generating watermarked adversarial texts

Adversarial example generation (AEG) has been a hot spot in recent years because it can cause deep neural networks (DNNs) to misclassify the generated adversarial examples, which reveals the vulnerability of DNNs, motivating us to find good solutions to improve the robustness of DNN models. Due to the extensiveness and high liquidity of natural language over the social networks, various natural language-based adversarial attack algorithms have been proposed in the literature. These algorithms generate adversarial text examples with high semantic quality. However, the generated adversarial text examples and the corresponding attack models may be maliciously or illegally used. To tackle this problem, we present a general framework encapsulated in the cloud application programming interfaces (APIs) for generating watermarked adversarial text examples to protect adversarial text examples and corresponding adversarial text attack models. For each word in a given text, a set of candidate words are determined to ensure that all the words in the set can be used to carry secret bits or facilitate the construction of adversarial example. By applying a word-level adversarial text generation algorithm, the watermarked adversarial text example can be finally generated. Experiment results show that the adversarial text examples generated by the proposed method not only successfully fool advanced DNN models, but also carry watermarks that can effectively verify the ownership and trace the source of the adversarial examples and the corresponding attack models. Moreover, the watermark can still survive after attacked with AEG algorithms, which has shown the applicability and superiority.

Publication:

Journal of Electronic Imaging

Pub Date:

March 2023

DOI:

10.1117/1.JEI.32.2.023023

arXiv:

arXiv:2110.12948

Bibcode:

2023JEI....32b3023L

Keywords:

adversarial examples;
watermarking;
text;
natural language;
deep learning;
Computer Science - Cryptography and Security;
Computer Science - Computation and Language

E-Print:

https://scholar.google.com/citations?user=IdiF7M0AAAAJ&amp

NASA/ADS

Generating watermarked adversarial texts

Abstract