ConSep: a Noise- and Reverberation-Robust Speech Separation Framework by Magnitude Conditioning

doi:10.48550/arXiv.2403.01792

ConSep: a Noise- and Reverberation-Robust Speech Separation Framework by Magnitude Conditioning

Speech separation has recently made significant progress thanks to the fine-grained vision used in time-domain methods. However, several studies have shown that adopting Short-Time Fourier Transform (STFT) for feature extraction could be beneficial when encountering harsher conditions, such as noise or reverberation. Therefore, we propose a magnitude-conditioned time-domain framework, ConSep, to inherit the beneficial characteristics. The experiment shows that ConSep promotes performance in anechoic, noisy, and reverberant settings compared to two celebrated methods, SepFormer and Bi-Sep. Furthermore, we visualize the components of ConSep to strengthen the advantages and cohere with the actualities we have found in preliminary studies.

Publication:

arXiv e-prints

Pub Date:

March 2024

DOI:

10.48550/arXiv.2403.01792

arXiv:

arXiv:2403.01792

Bibcode:

2024arXiv240301792H

Keywords:

Computer Science - Sound;
Electrical Engineering and Systems Science - Audio and Speech Processing

NASA/ADS

ConSep: a Noise- and Reverberation-Robust Speech Separation Framework by Magnitude Conditioning

Abstract