Audio Spectrogram Representations for Processing with Convolutional Neural Networks
Abstract
One of the decisions that arise when designing a neural network for any application is how the data should be represented in order to be presented to, and possibly generated by, a neural network. For audio, the choice is less obvious than it seems to be for visual images, and a variety of representations have been used for different applications including the raw digitized sample stream, hand-crafted features, machine discovered features, MFCCs and variants that include deltas, and a variety of spectral representations. This paper reviews some of these representations and issues that arise, focusing particularly on spectrograms for generating audio using neural networks for style transfer.
- Publication:
-
Proceedings of the First International Conference on Deep Learning and Music
- Pub Date:
- May 2017
- DOI:
- arXiv:
- arXiv:1706.09559
- Bibcode:
- 2017dlm..conf...37W
- Keywords:
-
- Computer Science - Sound;
- Computer Science - Machine Learning;
- Computer Science - Multimedia;
- Computer Science - Neural and Evolutionary Computing;
- 68Txx;
- C.1.3;
- H.5.1
- E-Print:
- Proceedings of the First International Conference on Deep Learning and Music, Anchorage, US, May, 2017 (arXiv:1706.08675v1 [cs.NE])