Parametric Resynthesis with neural vocoders

doi:10.48550/arXiv.1906.06762

Parametric Resynthesis with neural vocoders

Noise suppression systems generally produce output speech with compromised quality. We propose to utilize the high quality speech generation capability of neural vocoders for noise suppression. We use a neural network to predict clean mel-spectrogram features from noisy speech and then compare two neural vocoders, WaveNet and WaveGlow, for synthesizing clean speech from the predicted mel spectrogram. Both WaveNet and WaveGlow achieve better subjective and objective quality scores than the source separation model Chimera++. Further, WaveNet and WaveGlow also achieve significantly better subjective quality ratings than the oracle Wiener mask. Moreover, we observe that between WaveNet and WaveGlow, WaveNet achieves the best subjective quality scores, although at the cost of much slower waveform generation.

Publication:

arXiv e-prints

Pub Date:

June 2019

DOI:

10.48550/arXiv.1906.06762

arXiv:

arXiv:1906.06762

Bibcode:

2019arXiv190606762M

Keywords:

Computer Science - Sound;
Computer Science - Machine Learning;
Electrical Engineering and Systems Science - Audio and Speech Processing

NASA/ADS

Parametric Resynthesis with neural vocoders

Abstract