Yineng Zhang

Cite

Notes

Only stored in your browser.

Attribution

2papers

Authored papers

FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving

arXiv 2025

QQQ: Quality Quattuor-Bit Quantization for Large Language Models

arXiv 2024

No known affiliations.

from 2 papers

Arvind Krishnamurthy

Baris Kasikci

Chao Wang

Chuan Liu

Jingyang Xiang

Lei Yu

Lequn Chen

Luis Ceze

Mincong Huang

Peng Zhang