Xie Chen

Associate professor at SJTU; speech recognition and multimodal speech-LLM researcher; lead on SLAM-LLM open framework.

Role: professor
Currently at: Independent
Scholar: scholar.google.com/citations
Papers: 19

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: scholar.google.com/citations

Attribution policy →

19papers

Authored papers

19

MOVA: Towards Scalable and Synchronized Video-Audio Generation

arXiv 2026

YuE: Scaling Open Foundation Models for Long-Form Music Generation

arXiv 2025

EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting

arXiv 2025

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

arXiv 2025

MagiCodec: Simple Masked Gaussian-Injected Codec for High-Fidelity Reconstruction and Generation

arXiv 2025

URO-Bench: A Comprehensive Benchmark for End-to-End Spoken Dialogue Models

arXiv 2025

MAC-SLU: Multi-Intent Automotive Cabin Spoken Language Understanding Benchmark

arXiv 2025

STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence

arXiv 2025

Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens

arXiv 2025

MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization

arXiv 2025

EAT: Self-Supervised Pre-Training with Efficient Audio Transformer

arXiv 2024

TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers

arXiv 2024

k2SSL: A Faster and Better Framework for Self-Supervised Speech Representation Learning

arXiv 2024

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

arXiv 2024

AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding

arXiv 2024

SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training

arXiv 2024

Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR

arXiv 2024

emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

arXiv 2023

Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning

arXiv 2023

Affiliations

Currently at

professor · community

Frequent co-authors

10

from 19 papers

Ziyang Ma

14 shared papers

Zhikang Niu

7 shared papers

Kai Yu

5 shared papers

Wenxi Chen

5 shared papers

Zhisheng Zheng

5 shared papers

Guanrou Yang

4 shared papers

Yifan Yang

4 shared papers

Chen Yang

3 shared papers

Jianwei Yu

3 shared papers

Ruibin Yuan

3 shared papers