A Dual-Decoder Conformer for Multilingual Speech Recognition

doi:10.48550/arXiv.2109.03277

A Dual-Decoder Conformer for Multilingual Speech Recognition

N, Krishna D

Transformer-based models have recently become very popular for sequence-to-sequence applications such as machine translation and speech recognition. This work proposes a dual-decoder transformer model for low-resource multilingual speech recognition for Indian languages. Our proposed model consists of a Conformer [1] encoder, two parallel transformer decoders, and a language classifier. We use a phoneme decoder (PHN-DEC) for the phoneme recognition task and a grapheme decoder (GRP-DEC) to predict grapheme sequence along with language information. We consider phoneme recognition and language identification as auxiliary tasks in the multi-task learning framework. We jointly optimize the network for phoneme recognition, grapheme recognition, and language identification tasks with Joint CTC-Attention [2] training. Our experiments show that we can obtain a significant reduction in WER over the baseline approaches. We also show that our dual-decoder approach obtains significant improvement over the single decoder approach.

Publication:

arXiv e-prints

Pub Date:

August 2021

DOI:

10.48550/arXiv.2109.03277

arXiv:

arXiv:2109.03277

Bibcode:

2021arXiv210903277N

Keywords:

Computer Science - Computation and Language;
Computer Science - Sound;
Electrical Engineering and Systems Science - Audio and Speech Processing

E-Print:

5 pages

NASA/ADS

A Dual-Decoder Conformer for Multilingual Speech Recognition

Abstract