Cite
Notes
Only stored in your browser.
Attribution
Black-Box On-Policy Distillation of Large Language Models
arXiv 2025
SeerAttention-R: Sparse Attention Adaptation for Long Reasoning
Differential Transformer
arXiv 2024
Agent Attention: On the Integration of Softmax and Linear Attention
arXiv 2023
from 4 papers
Li Dong
Furu Wei
Gao Huang
Yuqing Xia
Yutao Sun
Dongchen Han
Fan Yang
Hayden Kwok-Hay So
Lei Wang
Lingxiao Ma