Low-resource speech recognition and dialect identification of Irish in a multi-task framework

doi:10.48550/arXiv.2405.01293

Low-resource speech recognition and dialect identification of Irish in a multi-task framework

This paper explores the use of Hybrid CTC/Attention encoder-decoder models trained with Intermediate CTC (InterCTC) for Irish (Gaelic) low-resource speech recognition (ASR) and dialect identification (DID). Results are compared to the current best performing models trained for ASR (TDNN-HMM) and DID (ECAPA-TDNN). An optimal InterCTC setting is initially established using a Conformer encoder. This setting is then used to train a model with an E-branchformer encoder and the performance of both architectures are compared. A multi-task fine-tuning approach is adopted for language model (LM) shallow fusion. The experiments yielded an improvement in DID accuracy of 10.8% relative to a baseline ECAPA-TDNN, and WER performance approaching the TDNN-HMM model. This multi-task approach emerges as a promising strategy for Irish low-resource ASR and DID.

Publication:

arXiv e-prints

Pub Date:

May 2024

DOI:

10.48550/arXiv.2405.01293

arXiv:

arXiv:2405.01293

Bibcode:

2024arXiv240501293L

Keywords:

Computer Science - Computation and Language;
Computer Science - Artificial Intelligence;
Computer Science - Sound;
Electrical Engineering and Systems Science - Audio and Speech Processing

E-Print:

7 pages. Accepted to Odyssey 2024 - The Speaker and Language Recognition Workshop

NASA/ADS

Low-resource speech recognition and dialect identification of Irish in a multi-task framework

Abstract