VoiceGuider: Enhancing Out-of-Domain Performance in Parameter-Efficient Speaker-Adaptive Text-to-Speech via Autoguidance

doi:10.48550/arXiv.2409.15759

VoiceGuider: Enhancing Out-of-Domain Performance in Parameter-Efficient Speaker-Adaptive Text-to-Speech via Autoguidance

When applying parameter-efficient finetuning via LoRA onto speaker adaptive text-to-speech models, adaptation performance may decline compared to full-finetuned counterparts, especially for out-of-domain speakers. Here, we propose VoiceGuider, a parameter-efficient speaker adaptive text-to-speech system reinforced with autoguidance to enhance the speaker adaptation performance, reducing the gap against full-finetuned models. We carefully explore various ways of strengthening autoguidance, ultimately finding the optimal strategy. VoiceGuider as a result shows robust adaptation performance especially on extreme out-of-domain speech data. We provide audible samples in our demo page.

Publication:

arXiv e-prints

Pub Date:

September 2024

DOI:

10.48550/arXiv.2409.15759

arXiv:

arXiv:2409.15759

Bibcode:

2024arXiv240915759Y

Keywords:

Computer Science - Sound;
Electrical Engineering and Systems Science - Audio and Speech Processing

E-Print:

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025, Demo Page: https://voiceguider.github.io/

ADS

VoiceGuider: Enhancing Out-of-Domain Performance in Parameter-Efficient Speaker-Adaptive Text-to-Speech via Autoguidance

Abstract