Yujun Lin
- Papers
- 11
Cite
Notes
Only stored in your browser.
Authored papers
11Flash-KMeans: Fast and Memory-Efficient Exact K-Means
arXiv 2026
LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
arXiv 2025
Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation
arXiv 2025
VLASH: Real-Time VLAs via Future-State-Aware Asynchronous Inference
arXiv 2025
Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter
arXiv 2025
Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation
arXiv 2025
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
arXiv 2025
SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
arXiv 2024
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
arXiv 2024
TorchSparse: Efficient Point Cloud Inference Engine
arXiv 2022
Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training
deep-gradient-compression-reducing-the-1
Affiliations
Frequent co-authors
10from 11 papers