Shang Yang
- Papers
- 11
Cite
Notes
Only stored in your browser.
Authored papers
11Stable Asynchrony: Variance-Controlled Off-Policy RL for LLMs
arXiv 2026
LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
arXiv 2025
VLASH: Real-Time VLAs via Future-State-Aware Asynchronous Inference
arXiv 2025
Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter
arXiv 2025
Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation
arXiv 2025
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
arXiv 2024
NVILA: Efficient Frontier Visual Language Models
CVPR 2025 1
Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models
arXiv 2024
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
arXiv 2024
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
arXiv 2024
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
arXiv 2023
Affiliations
Frequent co-authors
10from 11 papers