Cite
Notes
Only stored in your browser.
Attribution
Hydra: Sequentially-Dependent Draft Heads for Medusa Decoding
arXiv 2024
Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization
Striped Attention: Faster Ring Attention for Causal Transformers
arXiv 2023
from 3 papers
Jonathan Ragan-Kelley
William Brandon
Zachary Ankner
Christopher Rinard
Dan Alistarh
Kevin Qian
Mayank Mishra
Naigang Wang
Rameswar Panda
Rishab Parthasarathy