Cite
Notes
Only stored in your browser.
Attribution
Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference
arXiv 2024
from 1 papers
Hao Mark Chen
Hongxiang Fan
Konstantin Mishchenko
Rui Li
Stylianos I. Venieris
Wayne Luk