Cite
Notes
Only stored in your browser.
Attribution
SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training
arXiv 2025
SageAttention2++: A More Efficient Implementation of SageAttention2
InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory
arXiv 2024
from 3 papers
Haofeng Huang
Jia Wei
Jianfei Chen
Jintao Zhang
Jun Zhu
Xiaoming Xu
Chaojun Xiao
Chendong Xiang
Guangxuan Xiao
Haoxu Wang