0

Melody-Guided Music Generation

The Melody Guided Music Generation model uses a diffusion approach conditioned on melody representations to generate high-quality music that aligns with both provided audio style and text descriptions.

Year
2024
Venue
arXiv 2024
Authors
5
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2409.20196v4ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

We present the Melody-Guided Music Generation (MG2) model, a novel approach using melody to guide the text-to-music generation that, despite a simple method and limited resources, achieves excellent performance. Specifically, we first align the text with audio waveforms and their associated melodies using the newly proposed Contrastive Language-Music Pretraining, enabling the learned text representation fused with implicit melody information. Subsequently, we condition the retrieval-augmented diffusion module on both text prompt and retrieved melody. This allows MG2 to generate music that reflects the content of the given text description, meantime keeping the intrinsic harmony under the guidance of explicit melody information. We conducted extensive experiments on two public datasets: MusicCaps and MusicBench. Surprisingly, the experimental results demonstrate that the proposed MG2 model surpasses current open-source text-to-music generation models, achieving this with fewer than 1/3 of the parameters or less than 1/200 of the training data compared to state-of-the-art counterparts. Furthermore, we conducted comprehensive human evaluations involving three types of users and five perspectives, using newly designed questionnaires to explore the potential real-world applications of MG2.

Authors

5