0

Xu Tan

Papers
31

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
31papers

Authored papers

31

Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges

arXiv 2026

2026

KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation

arXiv 2026

2026

UI-Copilot: Advancing Long-Horizon GUI Automation via Tool-Integrated Policy Optimization

arXiv 2026

2026

YuE: Scaling Open Foundation Models for Long-Form Music Generation

arXiv 2025

2025

Kimi-Audio Technical Report

arXiv 2025

2025

MoonCast: High-Quality Zero-Shot Podcast Generation

arXiv 2025

2025

The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation

ICCV 2025

2025

Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis

arXiv 2025

2025

Chain-of-Model Learning for Language Model

arXiv 2025

2025

EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction

arXiv 2024

2024

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

arXiv 2024

2024

E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS

arXiv 2024

2024

Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

arXiv 2024

2024

Foundation Models for Music: A Survey

arXiv 2024

2024

BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec

arXiv 2024

2024

EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms

arXiv 2024

2024

VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling

CVPR 2025 1

2024

Mitigating Reversal Curse in Large Language Models via Semantic-aware Permutation Training

arXiv 2024

2024

InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation

arXiv 2024

2024

CLaMP: Contrastive Language-Music Pre-training for Cross-Modal Symbolic Music Information Retrieval

arXiv 2023

2023

MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models

arXiv 2023

2023

UniAudio: An Audio Foundation Model Toward Universal Audio Generation

arXiv 2023

2023

EvoPrompt: Connecting LLMs with Evolutionary Algorithms Yields Powerful Prompt Optimizers

arXiv 2023

2023

CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model

arXiv 2023

2023

MSS-PAE: Saving Autoencoder-based Outlier Detection from Unexpected Reconstruction

arXiv 2023

2023

NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality

arXiv 2022

2022

Empowering Diffusion Models on the Embedding Space for Text Generation

arXiv 2022

2022

SongMASS: Automatic Song Writing with Pre-training and Alignment Constraint

arXiv 2020

2020

Accuracy Prediction with Non-neural Model for Neural Architecture Search

arXiv 2020

2020

MPNet: Masked and Permuted Pre-training for Language Understanding

NeurIPS 2020 12

2020

MASS: Masked Sequence to Sequence Pre-training for Language Generation

arXiv 2019

2019

Affiliations

No known affiliations.

Frequent co-authors

10

from 31 papers