Toroidal Probabilistic Spherical Discriminant Analysis

doi:10.48550/arXiv.2210.15441

Toroidal Probabilistic Spherical Discriminant Analysis

In speaker recognition, where speech segments are mapped to embeddings on the unit hypersphere, two scoring back-ends are commonly used, namely cosine scoring and PLDA. We have recently proposed PSDA, an analog to PLDA that uses Von Mises-Fisher distributions instead of Gaussians. In this paper, we present toroidal PSDA (T-PSDA). It extends PSDA with the ability to model within and between-speaker variabilities in toroidal submanifolds of the hypersphere. Like PLDA and PSDA, the model allows closed-form scoring and closed-form EM updates for training. On VoxCeleb, we find T-PSDA accuracy on par with cosine scoring, while PLDA accuracy is inferior. On NIST SRE'21 we find that T-PSDA gives large accuracy gains compared to both cosine scoring and PLDA.

Publication:

arXiv e-prints

Pub Date:

October 2022

DOI:

10.48550/arXiv.2210.15441

arXiv:

arXiv:2210.15441

Bibcode:

2022arXiv221015441S

Keywords:

Computer Science - Sound;
Electrical Engineering and Systems Science - Audio and Speech Processing;
Statistics - Machine Learning

E-Print:

Submitted to ICASSP 2023

NASA/ADS

Toroidal Probabilistic Spherical Discriminant Analysis

Abstract