Zhen Ye
- Papers
- 9
Cite
Notes
Only stored in your browser.
Authored papers
9YuE: Scaling Open Foundation Models for Long-Form Music Generation
arXiv 2025
LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement
arXiv 2025
SpaceVista: All-Scale Visual Spatial Reasoning from mm to km
arXiv 2025
AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness
arXiv 2025
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens
arXiv 2025
Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis
arXiv 2025
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
arXiv 2024
ScratchEval: Are GPT-4o Smarter than My Child? Evaluating Large Multimodal Models with Visual Programming Challenges
arXiv 2024
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model
arXiv 2023
Affiliations
Frequent co-authors
10from 9 papers