Analysing Deep Learning-Spectral Envelope Prediction Methods for Singing Synthesis
Abstract
We conduct an investigation on various hyper-parameters regarding neural networks used to generate spectral envelopes for singing synthesis. Two perceptive tests, where the first compares two models directly and the other ranks models with a mean opinion score, are performed. With these tests we show that when learning to predict spectral envelopes, 2d-convolutions are superior over previously proposed 1d-convolutions and that predicting multiple frames in an iterated fashion during training is superior over injecting noise to the input data. An experimental investigation whether learning to predict a probability distribution vs.\ single samples was performed but turned out to be inconclusive. A network architecture is proposed that incorporates the improvements which we found to be useful and we show in our experiments that this network produces better results than other stat-of-the-art methods.
- Publication:
-
arXiv e-prints
- Pub Date:
- March 2019
- DOI:
- 10.48550/arXiv.1903.01161
- arXiv:
- arXiv:1903.01161
- Bibcode:
- 2019arXiv190301161B
- Keywords:
-
- Electrical Engineering and Systems Science - Audio and Speech Processing;
- Computer Science - Sound
- E-Print:
- Published in Proceedings of the 27th European Signal Processing Conference (EUSIPCO), 2019