Few-Shot Open-Set Learning for On-Device Customization of KeyWord Spotting Systems

doi:10.48550/arXiv.2306.02161

Few-Shot Open-Set Learning for On-Device Customization of KeyWord Spotting Systems

A personalized KeyWord Spotting (KWS) pipeline typically requires the training of a Deep Learning model on a large set of user-defined speech utterances, preventing fast customization directly applied on-device. To fill this gap, this paper investigates few-shot learning methods for open-set KWS classification by combining a deep feature encoder with a prototype-based classifier. With user-defined keywords from 10 classes of the Google Speech Command dataset, our study reports an accuracy of up to 76% in a 10-shot scenario while the false acceptance rate of unknown data is kept to 5%. In the analyzed settings, the usage of the triplet loss to train an encoder with normalized output features performs better than the prototypical networks jointly trained with a generator of dummy unknown-class prototypes. This design is also more effective than encoders trained on a classification problem and features fewer parameters than other iso-accuracy approaches.

Publication:

arXiv e-prints

Pub Date:

June 2023

DOI:

10.48550/arXiv.2306.02161

arXiv:

arXiv:2306.02161

Bibcode:

2023arXiv230602161R

Keywords:

Computer Science - Machine Learning

E-Print:

Accepted at INTERSPEECH 2023

NASA/ADS

Few-Shot Open-Set Learning for On-Device Customization of KeyWord Spotting Systems

Abstract