Cite
Notes
Only stored in your browser.
Attribution
FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
arXiv 2025
QQQ: Quality Quattuor-Bit Quantization for Large Language Models
arXiv 2024
from 2 papers
Arvind Krishnamurthy
Baris Kasikci
Chao Wang
Chuan Liu
Jingyang Xiang
Lei Yu
Lequn Chen
Luis Ceze
Mincong Huang
Peng Zhang