Cite
Notes
Only stored in your browser.
Attribution
GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM
arXiv 2024
Demystifying AI Platform Design for Distributed Inference of Next-Generation LLM models
from 2 papers
Souvik Kundu
Tushar Krishna
Abhimanyu Bambhaniya
Hao Kang
Madhu Kumar
Midhilesh Elavazhagan
Qingru Zhang
Ritik Raj
Sudarshan Srinivasan
Suvinay Subramanian