Xu Tan
- Papers
- 31
Cite
Notes
Only stored in your browser.
Authored papers
31Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges
arXiv 2026
KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation
arXiv 2026
UI-Copilot: Advancing Long-Horizon GUI Automation via Tool-Integrated Policy Optimization
arXiv 2026
YuE: Scaling Open Foundation Models for Long-Form Music Generation
arXiv 2025
Kimi-Audio Technical Report
arXiv 2025
MoonCast: High-Quality Zero-Shot Podcast Generation
arXiv 2025
The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation
ICCV 2025
Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis
arXiv 2025
Chain-of-Model Learning for Language Model
arXiv 2025
EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction
arXiv 2024
Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
arXiv 2024
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
arXiv 2024
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
arXiv 2024
Foundation Models for Music: A Survey
arXiv 2024
BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec
arXiv 2024
EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms
arXiv 2024
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
CVPR 2025 1
Mitigating Reversal Curse in Large Language Models via Semantic-aware Permutation Training
arXiv 2024
InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation
arXiv 2024
CLaMP: Contrastive Language-Music Pre-training for Cross-Modal Symbolic Music Information Retrieval
arXiv 2023
MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models
arXiv 2023
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
arXiv 2023
EvoPrompt: Connecting LLMs with Evolutionary Algorithms Yields Powerful Prompt Optimizers
arXiv 2023
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model
arXiv 2023
MSS-PAE: Saving Autoencoder-based Outlier Detection from Unexpected Reconstruction
arXiv 2023
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality
arXiv 2022
Empowering Diffusion Models on the Embedding Space for Text Generation
arXiv 2022
SongMASS: Automatic Song Writing with Pre-training and Alignment Constraint
arXiv 2020
Accuracy Prediction with Non-neural Model for Neural Architecture Search
arXiv 2020
MPNet: Masked and Permuted Pre-training for Language Understanding
NeurIPS 2020 12
MASS: Masked Sequence to Sequence Pre-training for Language Generation
arXiv 2019
Affiliations
Frequent co-authors
10from 31 papers