Locale Encoding For Scalable Multilingual Keyword Spotting Models
Abstract
A Multilingual Keyword Spotting (KWS) system detects spokenkeywords over multiple locales. Conventional monolingual KWSapproaches do not scale well to multilingual scenarios because ofhigh development/maintenance costs and lack of resource sharing.To overcome this limit, we propose two locale-conditioned universalmodels with locale feature concatenation and feature-wise linearmodulation (FiLM). We compare these models with two baselinemethods: locale-specific monolingual KWS, and a single universalmodel trained over all data. Experiments over 10 localized languagedatasets show that locale-conditioned models substantially improveaccuracy over baseline methods across all locales in different noiseconditions.FiLMperformed the best, improving on average FRRby 61% (relative) compared to monolingual KWS models of similarsizes.
- Publication:
-
arXiv e-prints
- Pub Date:
- February 2023
- DOI:
- 10.48550/arXiv.2302.12961
- arXiv:
- arXiv:2302.12961
- Bibcode:
- 2023arXiv230212961Z
- Keywords:
-
- Computer Science - Computation and Language;
- Computer Science - Machine Learning
- E-Print:
- Accepted for ICASSP 2023