Audio-to-Score Alignment Using Deep Automatic Music Transcription
Abstract
Audio-to-score alignment (A2SA) is a multimodal task consisting in the alignment of audio signals to music scores. Recent literature confirms the benefits of Automatic Music Transcription (AMT) for A2SA at the frame-level. In this work, we aim to elaborate on the exploitation of AMT Deep Learning (DL) models for achieving alignment at the note-level. We propose a method which benefits from HMM-based score-to-score alignment and AMT, showing a remarkable advancement beyond the state-of-the-art. We design a systematic procedure to take advantage of large datasets which do not offer an aligned score. Finally, we perform a thorough comparison and extensive tests on multiple datasets.
- Publication:
-
arXiv e-prints
- Pub Date:
- July 2021
- DOI:
- 10.48550/arXiv.2107.12854
- arXiv:
- arXiv:2107.12854
- Bibcode:
- 2021arXiv210712854S
- Keywords:
-
- Computer Science - Sound;
- Computer Science - Multimedia;
- Electrical Engineering and Systems Science - Audio and Speech Processing
- E-Print:
- IEEE MMSP 2021 - ERRATUM