Cite
Notes
Only stored in your browser.
Attribution
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float
arXiv 2025
SpaLLM: Unified Compressive Adaptation of Large Language Models with Sketching
arXiv 2024
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
arXiv 2023
from 3 papers
Tianyi Zhang
Beidi Chen
Binhang Yuan
Ce Zhang
Christopher Ré
Jue Wang
Junda Su
Oscar Wu
Shaochen Zhong
Tianyi Zhou