Learning Interpretable Representation for Controllable Polyphonic Music Generation
Abstract
While deep generative models have become the leading methods for algorithmic composition, it remains a challenging problem to control the generation process because the latent variables of most deep-learning models lack good interpretability. Inspired by the content-style disentanglement idea, we design a novel architecture, under the VAE framework, that effectively learns two interpretable latent factors of polyphonic music: chord and texture. The current model focuses on learning 8-beat long piano composition segments. We show that such chord-texture disentanglement provides a controllable generation pathway leading to a wide spectrum of applications, including compositional style transfer, texture variation, and accompaniment arrangement. Both objective and subjective evaluations show that our method achieves a successful disentanglement and high quality controlled music generation.
- Publication:
-
arXiv e-prints
- Pub Date:
- August 2020
- DOI:
- 10.48550/arXiv.2008.07122
- arXiv:
- arXiv:2008.07122
- Bibcode:
- 2020arXiv200807122W
- Keywords:
-
- Computer Science - Sound;
- Computer Science - Computation and Language;
- Computer Science - Machine Learning;
- Electrical Engineering and Systems Science - Audio and Speech Processing
- E-Print:
- In Proceedings of 21st International Conference on Music Information Retrieval (ISMIR), Montreal, Canada, 2020