Yang Sui
- Papers
- 8
Cite
Notes
Only stored in your browser.
Authored papers
8TIDE: Efficient and Lossless MoE Diffusion LLM Inference with I/O-aware Expert Offload
arXiv 2026
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
arXiv 2025
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float
arXiv 2025
When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios
arXiv 2025
Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models
arXiv 2025
HoliTom: Holistic Token Merging for Fast Video Large Language Models
arXiv 2025
BitsFusion: 1.99 bits Weight Quantization of Diffusion Model
arXiv 2024
DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models
CVPR 2025 1
Affiliations
Frequent co-authors
10from 8 papers