We introduce Kimi Linear, a hybrid linear attention architecture that, for the first time, outperforms full attention under fair comparisons across various scenarios -- including short-context, long-context, and reinforcement learning (RL) scaling regimes. At its core lies Kimi Delta Attention (KDA), an expressive linear attention module that extends Gated DeltaNet with a finer-grained gating mechanism, enabling more effective use of limited finite-state RNN memory. Our bespoke chunkwise algorithm achieves high hardware efficiency through a specialized variant of the Diagonal-Plus-Low-Rank (DPLR) transition matrices, which substantially reduces computation compared to the general DPLR formulation while remaining more consistent with the classical delta rule. We pretrain a Kimi Linear model with 3B activated parameters and 48B total parameters, based on a layerwise hybrid of KDA and Multi-Head Latent Attention (MLA). Our experiments show that with an identical training recipe, Kimi Linear outperforms full MLA with a sizeable margin across all evaluated tasks, while reducing KV cache usage by up to 75% and achieving up to 6 times decoding throughput for a 1M context. These results demonstrate that Kimi Linear can be a drop-in replacement for full attention architectures with superior performance and efficiency, including tasks with longer input and output lengths. To support further research, we open-source the KDA kernel and vLLM implementations, and release the pre-trained and instruction-tuned model checkpoints.
Kimi Linear: An Expressive, Efficient Attention Architecture
Kimi Linear, a hybrid linear attention architecture, outperforms full attention in various scenarios with improved efficiency and performance, using Kimi Delta Attention and Multi-Head Latent Attention.
- Year
- 2025
- Venue
- arXiv 2025
- Authors
- 60
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2510.26692ARXIV-DEFAULT
- TL;DR
- Semantic Scholar
Abstract
Authors
60Longhui YuYuhao WuYu ZhangSonglin YangZhiyuan LiWeiran HeXinran XuXinyu ZhouJianlin SuXingcheng YaoZhejun JiangGuokun LaiYulun DuWeixin XuEnzhe LuJunjie YanYanru ChenHuabin ZhengYibo LiuShaowei LiuBohong YinYuzhi WangMengnan DongZheng ZhangYuxin WuZhilin YangZhengtao WangChu WeiFeng WangZongyu LinChao HongEnming YuanJiezhong QiuKimi TeamDehao ZhangWeixiao HuangYejie WangFanqing MengWenhao WuYutian ChenBo PangHaiming WangSiyuan PanJiacheng YouXin MenLongguang ZhongYiwei LiYucheng WangGuanduo ChenJiaxi HuChengyin LiuWentao LiWeizhou LiuYu FanYizhi ZhangT. Y. LiuShengjun FangLongyu GuanJiawen TaoGuohong Fu