Geonhwa Jeong

Cite

Notes

Only stored in your browser.

Attribution

2papers

Authored papers

GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM

arXiv 2024

Demystifying AI Platform Design for Distributed Inference of Next-Generation LLM models

arXiv 2024

No known affiliations.

from 2 papers

Souvik Kundu

Tushar Krishna

Abhimanyu Bambhaniya

Hao Kang

Madhu Kumar

Midhilesh Elavazhagan

Qingru Zhang

Ritik Raj

Sudarshan Srinivasan

Suvinay Subramanian