Sharing Low Rank Conformer Weights for Tiny Always-On Ambient Speech Recognition Models

doi:10.48550/arXiv.2303.08343

Sharing Low Rank Conformer Weights for Tiny Always-On Ambient Speech Recognition Models

Continued improvements in machine learning techniques offer exciting new opportunities through the use of larger models and larger training datasets. However, there is a growing need to offer these new capabilities on-board low-powered devices such as smartphones, wearables and other embedded environments where only low memory is available. Towards this, we consider methods to reduce the model size of Conformer-based speech recognition models which typically require models with greater than 100M parameters down to just $5$M parameters while minimizing impact on model quality. Such a model allows us to achieve always-on ambient speech recognition on edge devices with low-memory neural processors. We propose model weight reuse at different levels within our model architecture: (i) repeating full conformer block layers, (ii) sharing specific conformer modules across layers, (iii) sharing sub-components per conformer module, and (iv) sharing decomposed sub-component weights after low-rank decomposition. By sharing weights at different levels of our model, we can retain the full model in-memory while increasing the number of virtual transformations applied to the input. Through a series of ablation studies and evaluations, we find that with weight sharing and a low-rank architecture, we can achieve a WER of 2.84 and 2.94 for Librispeech dev-clean and test-clean respectively with a $5$M parameter model.

Publication:

arXiv e-prints

Pub Date:

March 2023

DOI:

10.48550/arXiv.2303.08343

arXiv:

arXiv:2303.08343

Bibcode:

2023arXiv230308343H

Keywords:

Electrical Engineering and Systems Science - Audio and Speech Processing;
Computer Science - Artificial Intelligence;
Computer Science - Machine Learning;
Computer Science - Sound

E-Print:

Accepted to IEEE ICASSP 2023

ADS

Sharing Low Rank Conformer Weights for Tiny Always-On Ambient Speech Recognition Models

Abstract