Boosting Star-GANs for Voice Conversion with Contrastive Discriminator
Abstract
Nonparallel multi-domain voice conversion methods such as the StarGAN-VCs have been widely applied in many scenarios. However, the training of these models usually poses a challenge due to their complicated adversarial network architectures. To address this, in this work we leverage the state-of-the-art contrastive learning techniques and incorporate an efficient Siamese network structure into the StarGAN discriminator. Our method is called SimSiam-StarGAN-VC and it boosts the training stability and effectively prevents the discriminator overfitting issue in the training process. We conduct experiments on the Voice Conversion Challenge (VCC 2018) dataset, plus a user study to validate the performance of our framework. Our experimental results show that SimSiam-StarGAN-VC significantly outperforms existing StarGAN-VC methods in terms of both the objective and subjective metrics.
- Publication:
-
arXiv e-prints
- Pub Date:
- September 2022
- DOI:
- 10.48550/arXiv.2209.10088
- arXiv:
- arXiv:2209.10088
- Bibcode:
- 2022arXiv220910088S
- Keywords:
-
- Electrical Engineering and Systems Science - Audio and Speech Processing;
- Computer Science - Artificial Intelligence;
- Computer Science - Machine Learning;
- Computer Science - Sound
- E-Print:
- 12 pages, 3 figures, Accepted by ICONIP 2022