Cite
Notes
Only stored in your browser.
Attribution
FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
arXiv 2025
NanoFlow: Towards Optimal Large Language Model Serving Throughput
arXiv 2024
Ray: A Distributed Framework for Emerging AI Applications
arXiv 2017
from 3 papers
Arvind Krishnamurthy
Baris Kasikci
Zihao Ye
Alexey Tumanov
Chien-Yu Lin
Dedong Xie
Eric Liang
Gefei Zuo
Ion Stoica
professor / co-founder
Kan Zhu