EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation

doi:10.48550/arXiv.2406.06185

EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation

We release the EARS (Expressive Anechoic Recordings of Speech) dataset, a high-quality speech dataset comprising 107 speakers from diverse backgrounds, totaling in 100 hours of clean, anechoic speech data. The dataset covers a large range of different speaking styles, including emotional speech, different reading styles, non-verbal sounds, and conversational freeform speech. We benchmark various methods for speech enhancement and dereverberation on the dataset and evaluate their performance through a set of instrumental metrics. In addition, we conduct a listening test with 20 participants for the speech enhancement task, where a generative method is preferred. We introduce a blind test set that allows for automatic online evaluation of uploaded data. Dataset download links and automatic evaluation server can be found online.

Publication:

arXiv e-prints

Pub Date:

June 2024

DOI:

10.48550/arXiv.2406.06185

arXiv:

arXiv:2406.06185

Bibcode:

2024arXiv240606185R

Keywords:

Electrical Engineering and Systems Science - Audio and Speech Processing;
Computer Science - Machine Learning;
Computer Science - Sound

E-Print:

Accepted at Interspeech 2024

NASA/ADS

EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation

Abstract