Audio-to-Score Alignment Using Deep Automatic Music Transcription

doi:10.48550/arXiv.2107.12854

Audio-to-Score Alignment Using Deep Automatic Music Transcription

Audio-to-score alignment (A2SA) is a multimodal task consisting in the alignment of audio signals to music scores. Recent literature confirms the benefits of Automatic Music Transcription (AMT) for A2SA at the frame-level. In this work, we aim to elaborate on the exploitation of AMT Deep Learning (DL) models for achieving alignment at the note-level. We propose a method which benefits from HMM-based score-to-score alignment and AMT, showing a remarkable advancement beyond the state-of-the-art. We design a systematic procedure to take advantage of large datasets which do not offer an aligned score. Finally, we perform a thorough comparison and extensive tests on multiple datasets.

Publication:

arXiv e-prints

Pub Date:

July 2021

DOI:

10.48550/arXiv.2107.12854

arXiv:

arXiv:2107.12854

Bibcode:

2021arXiv210712854S

Keywords:

Computer Science - Sound;
Computer Science - Multimedia;
Electrical Engineering and Systems Science - Audio and Speech Processing

E-Print:

IEEE MMSP 2021 - ERRATUM

NASA/ADS

Audio-to-Score Alignment Using Deep Automatic Music Transcription

Abstract