Yujun Lin

Papers: 11

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: Semantic Scholar

Attribution policy →

11papers

Authored papers

Flash-KMeans: Fast and Memory-Efficient Exact K-Means

arXiv 2026

2026

LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

arXiv 2025

2025

Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation

arXiv 2025

2025

VLASH: Real-Time VLAs via Future-State-Aware Asynchronous Inference

arXiv 2025

2025

Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter

arXiv 2025

2025

Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation

arXiv 2025

2025

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

arXiv 2025

2025

SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

arXiv 2024

2024

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

arXiv 2024

2024

TorchSparse: Efficient Point Cloud Inference Engine

arXiv 2022

2022

Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training

deep-gradient-compression-reducing-the-1

2017

Affiliations

No known affiliations.

Frequent co-authors

from 11 papers

Song Han

Muyang Li

Shang Yang

Xiuyu Li

Han Cai

Haocheng Xi

Haotian Tang

Ion Stoica

professor / co-founder

3 shared papers

Junxian Guo

3 shared papers

Kurt Keutzer

3 shared papers