Sams-Net: A Sliced Attention-based Neural Network for Music Source Separation

doi:10.48550/arXiv.1909.05746

Sams-Net: A Sliced Attention-based Neural Network for Music Source Separation

Convolutional Neural Network (CNN) or Long short-term memory (LSTM) based models with the input of spectrogram or waveforms are commonly used for deep learning based audio source separation. In this paper, we propose a Sliced Attention-based neural network (Sams-Net) in the spectrogram domain for the music source separation task. It enables spectral feature interactions with multi-head attention mechanism, achieves easier parallel computing and has a larger receptive field compared with LSTMs and CNNs respectively. Experimental results on the MUSDB18 dataset show that the proposed method, with fewer parameters, outperforms most of the state-of-the-art DNN-based methods.

Publication:

arXiv e-prints

Pub Date:

September 2019

DOI:

10.48550/arXiv.1909.05746

arXiv:

arXiv:1909.05746

Bibcode:

2019arXiv190905746L

Keywords:

Electrical Engineering and Systems Science - Audio and Speech Processing;
Computer Science - Information Retrieval;
Computer Science - Machine Learning;
Computer Science - Sound

E-Print:

Submitted to Interspeech 2020

NASA/ADS

Sams-Net: A Sliced Attention-based Neural Network for Music Source Separation

Abstract