0

Xie Chen

Associate professor at SJTU; speech recognition and multimodal speech-LLM researcher; lead on SLAM-LLM open framework.

Role
professor
Currently at
Independent
Papers
19

Cite

Notes

Only stored in your browser.

19papers

Authored papers

19

MOVA: Towards Scalable and Synchronized Video-Audio Generation

arXiv 2026

2026

YuE: Scaling Open Foundation Models for Long-Form Music Generation

arXiv 2025

2025

EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting

arXiv 2025

2025

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

arXiv 2025

2025

MagiCodec: Simple Masked Gaussian-Injected Codec for High-Fidelity Reconstruction and Generation

arXiv 2025

2025

URO-Bench: A Comprehensive Benchmark for End-to-End Spoken Dialogue Models

arXiv 2025

2025

MAC-SLU: Multi-Intent Automotive Cabin Spoken Language Understanding Benchmark

arXiv 2025

2025

STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence

arXiv 2025

2025

Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens

arXiv 2025

2025

MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization

arXiv 2025

2025

k2SSL: A Faster and Better Framework for Self-Supervised Speech Representation Learning

arXiv 2024

2024

EAT: Self-Supervised Pre-Training with Efficient Audio Transformer

arXiv 2024

2024

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

arXiv 2024

2024

AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding

arXiv 2024

2024

SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training

arXiv 2024

2024

TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers

arXiv 2024

2024

Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR

arXiv 2024

2024

emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

arXiv 2023

2023

Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning

arXiv 2023

2023

Affiliations

Currently at

Independent

professor · community

Frequent co-authors

10

from 19 papers