A Multi Level Data Fusion Approach for Speaker Identification on Telephone Speech

doi:10.48550/arXiv.1407.0380

A Multi Level Data Fusion Approach for Speaker Identification on Telephone Speech

Several speaker identification systems are giving good performance with clean speech but are affected by the degradations introduced by noisy audio conditions. To deal with this problem, we investigate the use of complementary information at different levels for computing a combined match score for the unknown speaker. In this work, we observe the effect of two supervised machine learning approaches including support vectors machines (SVM) and naïve bayes (NB). We define two feature vector sets based on mel frequency cepstral coefficients (MFCC) and relative spectral perceptual linear predictive coefficients (RASTA-PLP). Each feature is modeled using the Gaussian Mixture Model (GMM). Several ways of combining these information sources give significant improvements in a text-independent speaker identification task using a very large telephone degraded NTIMIT database.

Publication:

arXiv e-prints

Pub Date:

June 2014

DOI:

10.48550/arXiv.1407.0380

arXiv:

arXiv:1407.0380

Bibcode:

2014arXiv1407.0380T

Keywords:

Computer Science - Sound;
Computer Science - Machine Learning

E-Print:

10 pages, 4 figures, International Journal of Signal Processing, Image Processing and Pattern Recognition Vol. 6, No. 2, April, 2013

NASA/ADS

A Multi Level Data Fusion Approach for Speaker Identification on Telephone Speech

Abstract