Shiliang Zhang
- Papers
- 17
Cite
Notes
Only stored in your browser.
Authored papers
17PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss
arXiv 2026
CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training
arXiv 2025
MagCache: Fast Video Generation with Magnitude-Aware Cache
arXiv 2025
EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting
arXiv 2025
OmniAudio: Generating Spatial Audio from 360-Degree Video
arXiv 2025
Differentiable Reward Optimization for LLM based TTS system
arXiv 2025
DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation
arXiv 2025
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
arXiv 2024
MV-VTON: Multi-View Virtual Try-On with Diffusion Models
arXiv 2024
Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR
arXiv 2024
SeACo-Paraformer: A Non-Autoregressive ASR System with Flexible and Effective Hotword Customization Ability
arXiv 2023
emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
arXiv 2023
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
arXiv 2023
FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec
arXiv 2023
Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs
CVPR 2024 1
ParCNetV2: Oversized Kernel with Enhanced Attention
ICCV 2023 1
Large-Scale Spatio-Temporal Person Re-identification: Algorithms and Benchmark
arXiv 2021
Affiliations
Frequent co-authors
10from 17 papers