Melody Is All You Need For Music Generation

doi:10.48550/arXiv.2409.20196

Melody Is All You Need For Music Generation

We present the Melody Guided Music Generation (MMGen) model, the first novel approach using melody to guide the music generation that, despite a pretty simple method and extremely limited resources, achieves excellent performance. Specifically, we first align the melody with audio waveforms and their associated descriptions using the multimodal alignment module. Subsequently, we condition the diffusion module on the learned melody representations. This allows MMGen to generate music that matches the style of the provided audio while also producing music that reflects the content of the given text description. To address the scarcity of high-quality data, we construct a multi-modal dataset, MusicSet, which includes melody, text, and audio, and will be made publicly available. We conduct extensive experiments which demonstrate the superiority of the proposed model both in terms of experimental metrics and actual performance quality.

Publication:

arXiv e-prints

Pub Date:

September 2024

DOI:

10.48550/arXiv.2409.20196

arXiv:

arXiv:2409.20196

Bibcode:

2024arXiv240920196W

Keywords:

Computer Science - Sound;
Computer Science - Artificial Intelligence;
Electrical Engineering and Systems Science - Audio and Speech Processing

E-Print:

9 pages, 1 figure, 2 tables

NASA/ADS

Melody Is All You Need For Music Generation

Abstract