A Deep Learning System for Domain-specific Speech Recognition

doi:10.48550/arXiv.2303.10510

A Deep Learning System for Domain-specific Speech Recognition

Jia, Yanan

As human-machine voice interfaces provide easy access to increasingly intelligent machines, many state-of-the-art automatic speech recognition (ASR) systems are proposed. However, commercial ASR systems usually have poor performance on domain-specific speech especially under low-resource settings. The author works with pre-trained DeepSpeech2 and Wav2Vec2 acoustic models to develop benefit-specific ASR systems. The domain-specific data are collected using proposed semi-supervised learning annotation with little human intervention. The best performance comes from a fine-tuned Wav2Vec2-Large-LV60 acoustic model with an external KenLM, which surpasses the Google and AWS ASR systems on benefit-specific speech. The viability of using error prone ASR transcriptions as part of spoken language understanding (SLU) is also investigated. Results of a benefit-specific natural language understanding (NLU) task show that the domain-specific fine-tuned ASR system can outperform the commercial ASR systems even when its transcriptions have higher word error rate (WER), and the results between fine-tuned ASR and human transcriptions are similar.

Publication:

arXiv e-prints

Pub Date:

March 2023

DOI:

10.48550/arXiv.2303.10510

arXiv:

arXiv:2303.10510

Bibcode:

2023arXiv230310510J

Keywords:

Computer Science - Computation and Language;
Computer Science - Machine Learning;
Computer Science - Sound;
Electrical Engineering and Systems Science - Audio and Speech Processing

E-Print:

4th International Conference on Natural Language Processing and Computational Linguistics (NLPCL 2023)

NASA/ADS

A Deep Learning System for Domain-specific Speech Recognition

Abstract