Cite
Notes
Only stored in your browser.
Attribution
SeerAttention-R: Sparse Attention Adaptation for Long Reasoning
arXiv 2025
Differential Transformer
arXiv 2024
from 2 papers
Li Dong
Tianzhu Ye
Yutao Sun
Fan Yang
Furu Wei
Gao Huang
Hayden Kwok-Hay So
Lei Wang
Lingxiao Ma
Mao Yang