Yuki Mitsufuji
- Papers
- 25
Cite
Notes
Only stored in your browser.
Authored papers
25DeepResonance: Enhancing Multimodal Music Understanding via Music-centric Multi-way Instruction Tuning
arXiv 2025
Improving Inference-Time Optimisation for Vocal Effects Style Transfer with a Gaussian Prior
arXiv 2025
HumanGif: Single-View Human Diffusion with Generative Prior
arXiv 2025
Training Consistency Models with Variational Noise Coupling
arXiv 2025
A Comprehensive Real-World Assessment of Audio Watermarking Algorithms: Will They Survive Neural Codecs?
arXiv 2025
MeanFlow Transformers with Representation Autoencoders
arXiv 2025
CARE: Aligning Language Models for Regional Cultural Awareness
arXiv 2025
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
CVPR 2025 1
GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping
arXiv 2024
Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning
arXiv 2024
SoundCTM: Unifying Score-based and Consistency Models for Full-band Text-to-Sound Generation
arXiv 2024
A Simple but Strong Baseline for Sounding Video Generation: Effective Adaptation of Audio and Video Diffusion Models for Joint Generation
arXiv 2024
GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models
arXiv 2024
MMDisCo: Multi-Modal Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation
arXiv 2024
MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models
arXiv 2024
Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion
arXiv 2023
BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network
bigvsan-enhancing-gan-based-neural-vocoders
SAN: Inducing Metrizability of GAN with Discriminative Normalized Linear Layer
arXiv 2023
GibbsDDRM: A Partially Collapsed Gibbs Sampler for Solving Blind Inverse Problems with Denoising Diffusion Restoration
arXiv 2023
STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events
NeurIPS 2023 11
PeaCoK: Persona Commonsense Knowledge for Consistent and Engaging Narratives
arXiv 2023
Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects
arXiv 2022
ComFact: A Benchmark for Linking Contextual Commonsense Knowledge
arXiv 2022
CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos
arXiv 2022
D3Net: Densely connected multidilated DenseNet for music source separation
arXiv 2020
Affiliations
Frequent co-authors
10from 25 papers