Rongjie Huang
- Papers
- 19
Cite
Notes
Only stored in your browser.
Authored papers
19HeartMuLa: A Family of Open Sourced Music Foundation Models
arXiv 2026
OmniAudio: Generating Spatial Audio from 360-Degree Video
arXiv 2025
Versatile Framework for Song Generation with Prompt-based Control
arXiv 2025
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
arXiv 2024
Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT
arXiv 2024
Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt
arXiv 2024
Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching
arXiv 2024
Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis
arXiv 2024
TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control
arXiv 2024
FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion
arXiv 2024
FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation
arXiv 2024
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
arXiv 2023
MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition
ICCV 2023 1
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
arXiv 2023
StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis
arXiv 2023
Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers
arXiv 2023
Detector Guidance for Multi-Object Text-to-Image Generation
arXiv 2023
ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech
arXiv 2022
FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis
arXiv 2022
Affiliations
Frequent co-authors
10from 19 papers