Cite
Notes
Only stored in your browser.
Attribution
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
arXiv 2024
Decoding Speculative Decoding
CHAI: Clustered Head Attention for Efficient LLM Inference
Cuttlefish: Low-Rank Model Training without All the Tuning
arXiv 2023
from 4 papers
Basil Hosmer
Bilge Acun
Carole-Jean Wu
Dimitris Papailiopoulos
Mostafa Elhoushi
Shivaram Venkataraman
Ahmed A Aly
Ahmed Roman
Akshat Shrivastava
Anas Mahmoud