Wei Xue
- Papers
- 24
Cite
Notes
Only stored in your browser.
Authored papers
24UniSH: Unifying Scene and Human Reconstruction in a Feed-Forward Pass
arXiv 2026
Audio-Omni: Extending Multi-modal Understanding to Versatile Audio Generation and Editing
arXiv 2026
YuE: Scaling Open Foundation Models for Long-Form Music Generation
arXiv 2025
SongEval: A Benchmark Dataset for Song Aesthetics Evaluation
arXiv 2025
MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
arXiv 2025
OmniAudio: Generating Spatial Audio from 360-Degree Video
arXiv 2025
LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement
arXiv 2025
ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing
arXiv 2025
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens
arXiv 2025
Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis
arXiv 2025
Audio-FLAN: A Preliminary Release
arXiv 2025
ChatMusician: Understanding and Generating Music Intrinsically with LLM
arXiv 2024
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
arXiv 2024
You Know What I'm Saying: Jailbreak Attack via Implicit Reference
arXiv 2024
Importance Weighting Can Help Large Language Models Self-Improve
arXiv 2024
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
CVPR 2025 1
MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
arXiv 2024
ComposerX: Multi-Agent Symbolic Music Composition with LLMs
arXiv 2024
FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation
arXiv 2024
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate
arXiv 2023
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model
arXiv 2023
ViDA: Homeostatic Visual Domain Adapter for Continual Test Time Adaptation
arXiv 2023
RJUA-QA: A Comprehensive QA Dataset for Urology
arXiv 2023
LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT
arXiv 2023
Affiliations
Frequent co-authors
10from 24 papers