Spectrogram Feature Losses for Music Source Separation
Abstract
In this paper we study deep learning-based music source separation, and explore using an alternative loss to the standard spectrogram pixel-level L2 loss for model training. Our main contribution is in demonstrating that adding a high-level feature loss term, extracted from the spectrograms using a VGG net, can improve separation quality vis-a-vis a pure pixel-level loss. We show this improvement in the context of the MMDenseNet, a State-of-the-Art deep learning model for this task, for the extraction of drums and vocal sounds from songs in the musdb18 database, covering a broad range of western music genres. We believe that this finding can be generalized and applied to broader machine learning-based systems in the audio domain.
- Publication:
-
arXiv e-prints
- Pub Date:
- January 2019
- DOI:
- arXiv:
- arXiv:1901.05061
- Bibcode:
- 2019arXiv190105061S
- Keywords:
-
- Computer Science - Sound;
- Computer Science - Machine Learning;
- Electrical Engineering and Systems Science - Audio and Speech Processing;
- Statistics - Machine Learning;
- 62;
- 68;
- I.2.6;
- H.5.5
- E-Print:
- Accepted for presentation at the 27th European Signal Processing Conference (EUSIPCO 2019)