EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation
Abstract
We release the EARS (Expressive Anechoic Recordings of Speech) dataset, a high-quality speech dataset comprising 107 speakers from diverse backgrounds, totaling in 100 hours of clean, anechoic speech data. The dataset covers a large range of different speaking styles, including emotional speech, different reading styles, non-verbal sounds, and conversational freeform speech. We benchmark various methods for speech enhancement and dereverberation on the dataset and evaluate their performance through a set of instrumental metrics. In addition, we conduct a listening test with 20 participants for the speech enhancement task, where a generative method is preferred. We introduce a blind test set that allows for automatic online evaluation of uploaded data. Dataset download links and automatic evaluation server can be found online.
- Publication:
-
arXiv e-prints
- Pub Date:
- June 2024
- DOI:
- 10.48550/arXiv.2406.06185
- arXiv:
- arXiv:2406.06185
- Bibcode:
- 2024arXiv240606185R
- Keywords:
-
- Electrical Engineering and Systems Science - Audio and Speech Processing;
- Computer Science - Machine Learning;
- Computer Science - Sound
- E-Print:
- Accepted at Interspeech 2024