A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning
Abstract
Aphasia is a language disorder that affects the speaking ability of millions of patients. This paper presents a new benchmark for Aphasia speech recognition and detection tasks using state-of-the-art speech recognition techniques with the AphsiaBank dataset. Specifically, we introduce two multi-task learning methods based on the CTC/Attention architecture to perform both tasks simultaneously. Our system achieves state-of-the-art speaker-level detection accuracy (97.3%), and a relative WER reduction of 11% for moderate Aphasia patients. In addition, we demonstrate the generalizability of our approach by applying it to another disordered speech database, the DementiaBank Pitt corpus. We will make our all-in-one recipes and pre-trained model publicly available to facilitate reproducibility. Our standardized data preprocessing pipeline and open-source recipes enable researchers to compare results directly, promoting progress in disordered speech processing.
- Publication:
-
arXiv e-prints
- Pub Date:
- May 2023
- DOI:
- arXiv:
- arXiv:2305.13331
- Bibcode:
- 2023arXiv230513331T
- Keywords:
-
- Electrical Engineering and Systems Science - Audio and Speech Processing;
- Computer Science - Computation and Language
- E-Print:
- Accepted at INTERSPEECH 2023. Code: https://github.com/espnet/espnet