Mandarin-English Code-switching Speech Recognition with Self-supervised Speech Representation Models
Abstract
Code-switching (CS) is common in daily conversations where more than one language is used within a sentence. The difficulties of CS speech recognition lie in alternating languages and the lack of transcribed data. Therefore, this paper uses the recently successful self-supervised learning (SSL) methods to leverage many unlabeled speech data without CS. We show that hidden representations of SSL models offer frame-level language identity even if the models are trained with English speech only. Jointly training CTC and language identification modules with self-supervised speech representations improves CS speech recognition performance. Furthermore, using multilingual speech data for pre-training obtains the best CS speech recognition.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2021
- DOI:
- 10.48550/arXiv.2110.03504
- arXiv:
- arXiv:2110.03504
- Bibcode:
- 2021arXiv211003504T
- Keywords:
-
- Computer Science - Computation and Language;
- Electrical Engineering and Systems Science - Audio and Speech Processing
- E-Print:
- Submitted to ICASSP 2022