Use of Speech Impairment Severity for Dysarthric Speech Recognition

doi:10.48550/arXiv.2305.10659

Use of Speech Impairment Severity for Dysarthric Speech Recognition

A key challenge in dysarthric speech recognition is the speaker-level diversity attributed to both speaker-identity associated factors such as gender, and speech impairment severity. Most prior researches on addressing this issue focused on using speaker-identity only. To this end, this paper proposes a novel set of techniques to use both severity and speaker-identity in dysarthric speech recognition: a) multitask training incorporating severity prediction error; b) speaker-severity aware auxiliary feature adaptation; and c) structured LHUC transforms separately conditioned on speaker-identity and severity. Experiments conducted on UASpeech suggest incorporating additional speech impairment severity into state-of-the-art hybrid DNN, E2E Conformer and pre-trained Wav2vec 2.0 ASR systems produced statistically significant WER reductions up to 4.78% (14.03% relative). Using the best system the lowest published WER of 17.82% (51.25% on very low intelligibility) was obtained on UASpeech.

Publication:

arXiv e-prints

Pub Date:

May 2023

DOI:

10.48550/arXiv.2305.10659

arXiv:

arXiv:2305.10659

Bibcode:

2023arXiv230510659G

Keywords:

Electrical Engineering and Systems Science - Audio and Speech Processing;
Computer Science - Artificial Intelligence;
Computer Science - Machine Learning;
Computer Science - Sound

E-Print:

Accepted to INTERSPEECH2023

NASA/ADS

Use of Speech Impairment Severity for Dysarthric Speech Recognition

Abstract