Analysing Deep Learning-Spectral Envelope Prediction Methods for Singing Synthesis

doi:10.48550/arXiv.1903.01161

Analysing Deep Learning-Spectral Envelope Prediction Methods for Singing Synthesis

We conduct an investigation on various hyper-parameters regarding neural networks used to generate spectral envelopes for singing synthesis. Two perceptive tests, where the first compares two models directly and the other ranks models with a mean opinion score, are performed. With these tests we show that when learning to predict spectral envelopes, 2d-convolutions are superior over previously proposed 1d-convolutions and that predicting multiple frames in an iterated fashion during training is superior over injecting noise to the input data. An experimental investigation whether learning to predict a probability distribution vs.\ single samples was performed but turned out to be inconclusive. A network architecture is proposed that incorporates the improvements which we found to be useful and we show in our experiments that this network produces better results than other stat-of-the-art methods.

Publication:

arXiv e-prints

Pub Date:

March 2019

DOI:

10.48550/arXiv.1903.01161

arXiv:

arXiv:1903.01161

Bibcode:

2019arXiv190301161B

Keywords:

Electrical Engineering and Systems Science - Audio and Speech Processing;
Computer Science - Sound

E-Print:

Published in Proceedings of the 27th European Signal Processing Conference (EUSIPCO), 2019

NASA/ADS

Analysing Deep Learning-Spectral Envelope Prediction Methods for Singing Synthesis

Abstract