Cite
Notes
Only stored in your browser.
Attribution
vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention
arXiv 2024
Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
from 2 papers
Jayashree Mohan
Ramachandran Ramjee
Ajay Nayak
Alexey Tumanov
Amey Agrawal
Bhargav S. Gulavani
Nipun Kwatra
Nitin Kedia
Ramya Prabhu