Challenging margin-based speaker embedding extractors by using the variational information bottleneck

doi:10.48550/arXiv.2406.12622

Challenging margin-based speaker embedding extractors by using the variational information bottleneck

Speaker embedding extractors are typically trained using a classification loss over the training speakers. During the last few years, the standard softmax/cross-entropy loss has been replaced by the margin-based losses, yielding significant improvements in speaker recognition accuracy. Motivated by the fact that the margin merely reduces the logit of the target speaker during training, we consider a probabilistic framework that has a similar effect. The variational information bottleneck provides a principled mechanism for making deterministic nodes stochastic, resulting in an implicit reduction of the posterior of the target speaker. We experiment with a wide range of speaker recognition benchmarks and scoring methods and report competitive results to those obtained with the state-of-the-art Additive Angular Margin loss.

Publication:

arXiv e-prints

Pub Date:

June 2024

DOI:

10.48550/arXiv.2406.12622

arXiv:

arXiv:2406.12622

Bibcode:

2024arXiv240612622S

Keywords:

Electrical Engineering and Systems Science - Audio and Speech Processing

E-Print:

Accepted at Interspeech 2024

NASA/ADS

Challenging margin-based speaker embedding extractors by using the variational information bottleneck

Abstract