Cite
Notes
Only stored in your browser.
Attribution
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
arXiv 2024
CHAI: Clustered Head Attention for Efficient LLM Inference
from 2 papers
Bilge Acun
Carole-Jean Wu
Mostafa Elhoushi
Saurabh Agarwal
Ahmed A Aly
Ahmed Roman
Akshat Shrivastava
Anas Mahmoud
Beidi Chen
Bram Wasti