LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation Generation

doi:10.48550/arXiv.2409.14743

LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation Generation

Previous fake speech datasets were constructed from a defender's perspective to develop countermeasure (CM) systems without considering diverse motivations of attackers. To better align with real-life scenarios, we created LlamaPartialSpoof, a 130-hour dataset that contains both fully and partially fake speech, using a large language model (LLM) and voice cloning technologies to evaluate the robustness of CMs. By examining valuable information for both attackers and defenders, we identify several key vulnerabilities in current CM systems, which can be exploited to enhance attack success rates, including biases toward certain text-to-speech models or concatenation methods. Our experimental results indicate that the current fake speech detection system struggle to generalize to unseen scenarios, achieving a best performance of 24.49% equal error rate.

Publication:

arXiv e-prints

Pub Date:

September 2024

DOI:

10.48550/arXiv.2409.14743

arXiv:

arXiv:2409.14743

Bibcode:

2024arXiv240914743L

Keywords:

Electrical Engineering and Systems Science - Audio and Speech Processing;
Computer Science - Sound

E-Print:

5 pages, ICASSP 2025

ADS

LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation Generation

Abstract