Cite
Notes
Only stored in your browser.
Attribution
vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention
arXiv 2024
Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
Etalon: Holistic Performance Evaluation Framework for LLM Inference Systems
from 3 papers
Ramachandran Ramjee
Alexey Tumanov
Amey Agrawal
Ashish Panwar
Nipun Kwatra
Nitin Kedia
Ajay Nayak
Anmol Agarwal
Bhargav S. Gulavani
Ramya Prabhu