Speech Segmentation Optimization using Segmented Bilingual Speech Corpus for End-to-end Speech Translation

doi:10.48550/arXiv.2203.15479

Speech Segmentation Optimization using Segmented Bilingual Speech Corpus for End-to-end Speech Translation

Speech segmentation, which splits long speech into short segments, is essential for speech translation (ST). Popular VAD tools like WebRTC VAD have generally relied on pause-based segmentation. Unfortunately, pauses in speech do not necessarily match sentence boundaries, and sentences can be connected by a very short pause that is difficult to detect by VAD. In this study, we propose a speech segmentation method using a binary classification model trained using a segmented bilingual speech corpus. We also propose a hybrid method that combines VAD and the above speech segmentation method. Experimental results revealed that the proposed method is more suitable for cascade and end-to-end ST systems than conventional segmentation methods. The hybrid approach further improved the translation performance.

Publication:

arXiv e-prints

Pub Date:

March 2022

DOI:

10.48550/arXiv.2203.15479

arXiv:

arXiv:2203.15479

Bibcode:

2022arXiv220315479F

Keywords:

Computer Science - Computation and Language;
Computer Science - Sound;
Electrical Engineering and Systems Science - Audio and Speech Processing

E-Print:

Accepted to INTERSPEECH 2022

NASA/ADS

Speech Segmentation Optimization using Segmented Bilingual Speech Corpus for End-to-end Speech Translation

Abstract