Did You Hear That? Introducing AADG: A Framework for Generating Benchmark Data in Audio Anomaly Detection
Abstract
We introduce a novel, general-purpose audio generation framework specifically designed for anomaly detection and localization. Unlike existing datasets that predominantly focus on industrial and machine-related sounds, our framework focuses a broader range of environments, particularly useful in real-world scenarios where only audio data are available, such as in video-derived or telephonic audio. To generate such data, we propose a new method inspired by the LLM-Modulo framework, which leverages large language models(LLMs) as world models to simulate such real-world scenarios. This tool is modular allowing a plug-and-play approach. It operates by first using LLMs to predict plausible real-world scenarios. An LLM further extracts the constituent sounds, the order and the way in which these should be merged to create coherent wholes. Much like the LLM-Modulo framework, we include rigorous verification of each output stage, ensuring the reliability of the generated data. The data produced using the framework serves as a benchmark for anomaly detection applications, potentially enhancing the performance of models trained on audio data, particularly in handling out-of-distribution cases. Our contributions thus fill a critical void in audio anomaly detection resources and provide a scalable tool for generating diverse, realistic audio data.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2024
- DOI:
- arXiv:
- arXiv:2410.03904
- Bibcode:
- 2024arXiv241003904R
- Keywords:
-
- Computer Science - Sound;
- Computer Science - Artificial Intelligence;
- Electrical Engineering and Systems Science - Audio and Speech Processing
- E-Print:
- 9 pages, under review