Few-Shot Drum Transcription in Polyphonic Music

doi:10.48550/arXiv.2008.02791

Few-Shot Drum Transcription in Polyphonic Music

Data-driven approaches to automatic drum transcription (ADT) are often limited to a predefined, small vocabulary of percussion instrument classes. Such models cannot recognize out-of-vocabulary classes nor are they able to adapt to finer-grained vocabularies. In this work, we address open vocabulary ADT by introducing few-shot learning to the task. We train a Prototypical Network on a synthetic dataset and evaluate the model on multiple real-world ADT datasets with polyphonic accompaniment. We show that, given just a handful of selected examples at inference time, we can match and in some cases outperform a state-of-the-art supervised ADT approach under a fixed vocabulary setting. At the same time, we show that our model can successfully generalize to finer-grained or extended vocabularies unseen during training, a scenario where supervised approaches cannot operate at all. We provide a detailed analysis of our experimental results, including a breakdown of performance by sound class and by polyphony.

Publication:

arXiv e-prints

Pub Date:

August 2020

DOI:

10.48550/arXiv.2008.02791

arXiv:

arXiv:2008.02791

Bibcode:

2020arXiv200802791W

Keywords:

Computer Science - Sound;
Electrical Engineering and Systems Science - Audio and Speech Processing

E-Print:

ISMIR 2020 camera-ready

NASA/ADS

Few-Shot Drum Transcription in Polyphonic Music

Abstract