Cite
Notes
Only stored in your browser.
Attribution
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
arXiv 2024
Squeezed Attention: Accelerating Long Context Length LLM Inference
from 2 papers
Amir Gholami
Coleman Hooper
Kurt Keutzer
Michael W. Mahoney
Sehoon Kim
June Paik
Monishwaran Maheswaran
Yakun Sophia Shao