Acoustic modeling for Overlapping Speech Recognition: JHU Chime-5 Challenge System

doi:10.48550/arXiv.2405.11078

Acoustic modeling for Overlapping Speech Recognition: JHU Chime-5 Challenge System

This paper summarizes our acoustic modeling efforts in the Johns Hopkins University speech recognition system for the CHiME-5 challenge to recognize highly-overlapped dinner party speech recorded by multiple microphone arrays. We explore data augmentation approaches, neural network architectures, front-end speech dereverberation, beamforming and robust i-vector extraction with comparisons of our in-house implementations and publicly available tools. We finally achieved a word error rate of 69.4% on the development set, which is a 11.7% absolute improvement over the previous baseline of 81.1%, and release this improved baseline with refined techniques/tools as an advanced CHiME-5 recipe.

Publication:

arXiv e-prints

Pub Date:

May 2024

DOI:

10.48550/arXiv.2405.11078

arXiv:

arXiv:2405.11078

Bibcode:

2024arXiv240511078M

Keywords:

Electrical Engineering and Systems Science - Audio and Speech Processing

E-Print:

Published in: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

NASA/ADS

Acoustic modeling for Overlapping Speech Recognition: JHU Chime-5 Challenge System

Abstract