Tackling the Score Shift in Cross-Lingual Speaker Verification by Exploiting Language Information

doi:10.48550/arXiv.2110.09150

Tackling the Score Shift in Cross-Lingual Speaker Verification by Exploiting Language Information

This paper contains a post-challenge performance analysis on cross-lingual speaker verification of the IDLab submission to the VoxCeleb Speaker Recognition Challenge 2021 (VoxSRC-21). We show that current speaker embedding extractors consistently underestimate speaker similarity in within-speaker cross-lingual trials. Consequently, the typical training and scoring protocols do not put enough emphasis on the compensation of intra-speaker language variability. We propose two techniques to increase cross-lingual speaker verification robustness. First, we enhance our previously proposed Large-Margin Fine-Tuning (LM-FT) training stage with a mini-batch sampling strategy which increases the amount of intra-speaker cross-lingual samples within the mini-batch. Second, we incorporate language information in the logistic regression calibration stage. We integrate quality metrics based on soft and hard decisions of a VoxLingua107 language identification model. The proposed techniques result in a 11.7% relative improvement over the baseline model on the VoxSRC-21 test set and contributed to our third place finish in the corresponding challenge.

Publication:

arXiv e-prints

Pub Date:

October 2021

DOI:

10.48550/arXiv.2110.09150

arXiv:

arXiv:2110.09150

Bibcode:

2021arXiv211009150T

Keywords:

Electrical Engineering and Systems Science - Audio and Speech Processing;
Computer Science - Sound

E-Print:

proceedings of ICASSP 2022

NASA/ADS

Tackling the Score Shift in Cross-Lingual Speaker Verification by Exploiting Language Information

Abstract