Dorien Herremans

JamendoMaxCaps: A Large Scale Music-caption Dataset with Imputed Metadata

arXiv 2025

Smart Timing for Mining: A Deep Learning Framework for Bitcoin Hardware ROI Prediction

arXiv 2025

NORA-1.5: A Vision-Language-Action Model Trained using World Model- and Action-based Preference Rewards

arXiv 2025

JAM: A Tiny Flow-based Song Generator with Fine-grained Controllability and Aesthetic Alignment

arXiv 2025

ImprovNet -- Generating Controllable Musical Improvisations with Iterative Corruption Refinement

arXiv 2025

Towards Unified Music Emotion Recognition across Dimensional and Categorical Models

arXiv 2025

SonicMaster: Towards Controllable All-in-One Music Restoration and Mastering

arXiv 2025

Text2midi-InferAlign: Improving Symbolic Music Generation with Inference-Time Alignment

arXiv 2025

Text2midi: Generating Symbolic Music from Captions

arXiv 2024

MidiCaps: A large-scale MIDI dataset with text captions

arXiv 2024

MIRFLEX: Music Information Retrieval Feature Library for Extraction

arXiv 2024

Are We There Yet? A Brief Survey of Music Emotion Prediction Datasets, Models and Outstanding Challenges

arXiv 2024

PRESENT: Zero-Shot Text-to-Prosody Control

arXiv 2024

Leveraging LLM Embeddings for Cross Dataset Label Alignment and Zero Shot Music Emotion Prediction

arXiv 2024

BandControlNet: Parallel Transformers-based Steerable Popular Music Generation with Fine-Grained Spatiotemporal Features

arXiv 2024

Natural Language Processing Methods for Symbolic Music Generation and Information Retrieval: a Survey

arXiv 2024

DeepUnifiedMom: Unified Time-series Momentum Portfolio Construction via Multi-Task Learning with Multi-Gate Mixture of Experts

arXiv 2024

Mustango: Toward Controllable Text-to-Music Generation

arXiv 2023

2023

Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model

arXiv 2023

2023

Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder

arXiv 2022

Predicting emotion from music videos: exploring the relative contribution of visual and auditory information to affective responses

arXiv 2022

PreBit -- A multimodal model with Twitter FinBERT embeddings for extreme price movement prediction of Bitcoin

arXiv 2022

HEAR: Holistic Evaluation of Audio Representations

arXiv 2022

Forecasting Bitcoin volatility spikes from whale transactions and CryptoQuant data using Synthesizer Transformer models

arXiv 2022

Understanding Audio Features via Trainable Basis Functions

arXiv 2022

Conditional Drums Generation using Compound Word Representations

arXiv 2022

Generating Lead Sheets with Affect: A Novel Conditional seq2seq Framework

arXiv 2021

2021

A variational autoencoder for music generation controlled by tonal tension

arXiv 2020

Generative Modelling for Controllable Audio Synthesis of Expressive Piano Performance

arXiv 2020

Music FaderNets: Controllable Music Generation Based On High-Level Features via Low-Level Feature Modelling

arXiv 2020

The impact of Audio input representations on neural network based music transcription

arXiv 2020

The Effect of Spectrogram Reconstruction on Automatic Music Transcription: An Alternative Approach to Improve Transcription Accuracy

arXiv 2020

nnAudio: An on-the-fly GPU Audio to Spectrogram Conversion Toolbox Using 1D Convolution Neural Networks

arXiv 2019

Midi Miner -- A Python library for tonal tension and track classification

arXiv 2019

Latent space representation for multi-target speaker detection and identification with a sparse dataset using Triplet neural networks

arXiv 2019

Multimodal Deep Models for Predicting Affective Responses Evoked by Movies

arXiv 2019