Cite
Notes
Only stored in your browser.
Attribution
Scaling Inference-Efficient Language Models
arXiv 2025
CHAI: Clustered Head Attention for Efficient LLM Inference
arXiv 2024
Decoding Speculative Decoding
from 3 papers
Minghao Yan
Saurabh Agarwal
Basil Hosmer
Bilge Acun
Carole-Jean Wu
Dimitris Papailiopoulos
Mostafa Elhoushi
Song Bian
Yejin Lee