Speech Segmentation Optimization using Segmented Bilingual Speech Corpus for End-to-end Speech Translation
Abstract
Speech segmentation, which splits long speech into short segments, is essential for speech translation (ST). Popular VAD tools like WebRTC VAD have generally relied on pause-based segmentation. Unfortunately, pauses in speech do not necessarily match sentence boundaries, and sentences can be connected by a very short pause that is difficult to detect by VAD. In this study, we propose a speech segmentation method using a binary classification model trained using a segmented bilingual speech corpus. We also propose a hybrid method that combines VAD and the above speech segmentation method. Experimental results revealed that the proposed method is more suitable for cascade and end-to-end ST systems than conventional segmentation methods. The hybrid approach further improved the translation performance.
- Publication:
-
arXiv e-prints
- Pub Date:
- March 2022
- DOI:
- 10.48550/arXiv.2203.15479
- arXiv:
- arXiv:2203.15479
- Bibcode:
- 2022arXiv220315479F
- Keywords:
-
- Computer Science - Computation and Language;
- Computer Science - Sound;
- Electrical Engineering and Systems Science - Audio and Speech Processing
- E-Print:
- Accepted to INTERSPEECH 2022