0

Wei Xue

Papers
24

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
24papers

Authored papers

24

UniSH: Unifying Scene and Human Reconstruction in a Feed-Forward Pass

arXiv 2026

2026

Audio-Omni: Extending Multi-modal Understanding to Versatile Audio Generation and Editing

arXiv 2026

2026

YuE: Scaling Open Foundation Models for Long-Form Music Generation

arXiv 2025

2025

SongEval: A Benchmark Dataset for Song Aesthetics Evaluation

arXiv 2025

2025

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

arXiv 2025

2025

OmniAudio: Generating Spatial Audio from 360-Degree Video

arXiv 2025

2025

LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement

arXiv 2025

2025

ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing

arXiv 2025

2025

Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens

arXiv 2025

2025

Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis

arXiv 2025

2025

Audio-FLAN: A Preliminary Release

arXiv 2025

2025

ChatMusician: Understanding and Generating Music Intrinsically with LLM

arXiv 2024

2024

Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

arXiv 2024

2024

You Know What I'm Saying: Jailbreak Attack via Implicit Reference

arXiv 2024

2024

Importance Weighting Can Help Large Language Models Self-Improve

arXiv 2024

2024

VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling

CVPR 2025 1

2024

MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions

arXiv 2024

2024

ComposerX: Multi-Agent Symbolic Music Composition with LLMs

arXiv 2024

2024

FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation

arXiv 2024

2024

ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate

arXiv 2023

2023

CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model

arXiv 2023

2023

ViDA: Homeostatic Visual Domain Adapter for Continual Test Time Adaptation

arXiv 2023

2023

RJUA-QA: A Comprehensive QA Dataset for Urology

arXiv 2023

2023

LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT

arXiv 2023

2023

Affiliations

No known affiliations.

Frequent co-authors

10

from 24 papers