Semi-Supervised Singing Voice Separation with Noisy Self-Training

doi:10.48550/arXiv.2102.07961

Semi-Supervised Singing Voice Separation with Noisy Self-Training

Recent progress in singing voice separation has primarily focused on supervised deep learning methods. However, the scarcity of ground-truth data with clean musical sources has been a problem for long. Given a limited set of labeled data, we present a method to leverage a large volume of unlabeled data to improve the model's performance. Following the noisy self-training framework, we first train a teacher network on the small labeled dataset and infer pseudo-labels from the large corpus of unlabeled mixtures. Then, a larger student network is trained on combined ground-truth and self-labeled datasets. Empirical results show that the proposed self-training scheme, along with data augmentation methods, effectively leverage the large unlabeled corpus and obtain superior performance compared to supervised methods.

Publication:

arXiv e-prints

Pub Date:

February 2021

DOI:

10.48550/arXiv.2102.07961

arXiv:

arXiv:2102.07961

Bibcode:

2021arXiv210207961W

Keywords:

Electrical Engineering and Systems Science - Audio and Speech Processing

E-Print:

Accepted at 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2021)

NASA/ADS

Semi-Supervised Singing Voice Separation with Noisy Self-Training

Abstract