Basil Hosmer

Cite

Notes

Only stored in your browser.

Attribution

2papers

Authored papers

LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding

arXiv 2024

CHAI: Clustered Head Attention for Efficient LLM Inference

arXiv 2024

No known affiliations.

from 2 papers

Bilge Acun

Carole-Jean Wu

Mostafa Elhoushi

Saurabh Agarwal

Ahmed A Aly

Ahmed Roman

Akshat Shrivastava

Anas Mahmoud

Beidi Chen

Bram Wasti