Cite
Notes
Only stored in your browser.
Attribution
SqueezeAttention: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budget
arXiv 2024
from 1 papers
Bin Cui
ZiHao Wang