Yang Sui

Papers: 8

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: Semantic Scholar

Attribution policy →

8papers

Authored papers

TIDE: Efficient and Lossless MoE Diffusion LLM Inference with I/O-aware Expert Offload

arXiv 2026

2026

Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

arXiv 2025

2025

70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float

arXiv 2025

2025

HoliTom: Holistic Token Merging for Fast Video Large Language Models

arXiv 2025

2025

Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models

arXiv 2025

2025

When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios

arXiv 2025

2025

BitsFusion: 1.99 bits Weight Quantization of Diffusion Model

arXiv 2024

2024

DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models

CVPR 2025 1

2024

Affiliations

No known affiliations.

Frequent co-authors

from 8 papers

Can Qin

Haoxuan You

Huan Wang

Keda Tao

Kele Shao

Shaochen Zhong

Tianyi Zhang

Xia Hu

Yuzhang Shang

Andrew Wen