SRTNet: Time Domain Speech Enhancement Via Stochastic Refinement
Abstract
Diffusion model, as a new generative model which is very popular in image generation and audio synthesis, is rarely used in speech enhancement. In this paper, we use the diffusion model as a module for stochastic refinement. We propose SRTNet, a novel method for speech enhancement via Stochastic Refinement in complete Time domain. Specifically, we design a joint network consisting of a deterministic module and a stochastic module, which makes up the ``enhance-and-refine'' paradigm. We theoretically demonstrate the feasibility of our method and experimentally prove that our method achieves faster training, faster sampling and higher quality. Our code and enhanced samples are available at https://github.com/zhibinQiu/SRTNet.git.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2022
- DOI:
- 10.48550/arXiv.2210.16805
- arXiv:
- arXiv:2210.16805
- Bibcode:
- 2022arXiv221016805Q
- Keywords:
-
- Computer Science - Sound;
- Electrical Engineering and Systems Science - Audio and Speech Processing