Boosting Star-GANs for Voice Conversion with Contrastive Discriminator

doi:10.48550/arXiv.2209.10088

Boosting Star-GANs for Voice Conversion with Contrastive Discriminator

Nonparallel multi-domain voice conversion methods such as the StarGAN-VCs have been widely applied in many scenarios. However, the training of these models usually poses a challenge due to their complicated adversarial network architectures. To address this, in this work we leverage the state-of-the-art contrastive learning techniques and incorporate an efficient Siamese network structure into the StarGAN discriminator. Our method is called SimSiam-StarGAN-VC and it boosts the training stability and effectively prevents the discriminator overfitting issue in the training process. We conduct experiments on the Voice Conversion Challenge (VCC 2018) dataset, plus a user study to validate the performance of our framework. Our experimental results show that SimSiam-StarGAN-VC significantly outperforms existing StarGAN-VC methods in terms of both the objective and subjective metrics.

Publication:

arXiv e-prints

Pub Date:

September 2022

DOI:

10.48550/arXiv.2209.10088

arXiv:

arXiv:2209.10088

Bibcode:

2022arXiv220910088S

Keywords:

Electrical Engineering and Systems Science - Audio and Speech Processing;
Computer Science - Artificial Intelligence;
Computer Science - Machine Learning;
Computer Science - Sound

E-Print:

12 pages, 3 figures, Accepted by ICONIP 2022

NASA/ADS

Boosting Star-GANs for Voice Conversion with Contrastive Discriminator

Abstract