Hiva Mohammadzadeh

Cite

Notes

Only stored in your browser.

Attribution

2papers

Authored papers

Squeezed Attention: Accelerating Long Context Length LLM Inference

arXiv 2024

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

arXiv 2024

No known affiliations.

from 2 papers

Amir Gholami

Coleman Hooper

Kurt Keutzer

Michael W. Mahoney

Sehoon Kim

June Paik

Monishwaran Maheswaran

Yakun Sophia Shao