Language model pretraining with next token prediction has proved effective for scaling compute but is limited to the amount of available training data. Scaling reinforcement learning (RL) unlocks a new axis for the continued improvement of artificial intelligence, with the promise that large language models (LLMs) can scale their training data by learning to explore with rewards. However, prior published work has not produced competitive results. In light of this, we report on the training practice of Kimi k1.5, our latest multi-modal LLM trained with RL, including its RL training techniques, multi-modal data recipes, and infrastructure optimization. Long context scaling and improved policy optimization methods are key ingredients of our approach, which establishes a simplistic, effective RL framework without relying on more complex techniques such as Monte Carlo tree search, value functions, and process reward models. Notably, our system achieves state-of-the-art reasoning performance across multiple benchmarks and modalities -- e.g., 77.5 on AIME, 96.2 on MATH 500, 94-th percentile on Codeforces, 74.9 on MathVista -- matching OpenAI's o1. Moreover, we present effective long2short methods that use long-CoT techniques to improve short-CoT models, yielding state-of-the-art short-CoT reasoning results -- e.g., 60.8 on AIME, 94.6 on MATH500, 47.3 on LiveCodeBench -- outperforming existing short-CoT models such as GPT-4o and Claude Sonnet 3.5 by a large margin (up to +550%).
Kimi k1.5: Scaling Reinforcement Learning with LLMs
A multi-modal LLM trained with reinforcement learning achieves state-of-the-art reasoning performance across various benchmarks by utilizing long context scaling and effective policy optimization methods.
- Year
- 2025
- Venue
- arXiv 2025
- Authors
- 94
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2501.12599ARXIV-DEFAULT
- TL;DR
- Semantic Scholar
Abstract
Authors
94Hao ZhangLonghui YuXuehai PanYang LiHao YangCheng LiWeiran HeXinran XuXinyu ZhouTao JiangJingyuan LiuJianlin SuGuokun LaiYulun DuYidao QinEnzhe LuJunjie YanYanru ChenHuabin ZhengYibo LiuShaowei LiuHan ZhuYuzhi WangJianzhou WangMengnan DongZheng ZhangYuxin WuZhilin YangY. CharlesYangyang LiuYing YangZaida ZhouHaoyu LuBofei GaoWeimin XiongEnming YuanZhiqi HuangHuan YuanJie ZhaoKimi TeamAngang DuBowei XingCheng ChenChenzhuang DuCongcong WangDehao ZhangFlood SungGuangda WeiHao DingHao HuHaotian YaoHongcheng GaoSihan CaoWeixiao HuangXingzhe WuXinxing ZuYangyang HuYejie WangYiping BaoZhaowei LiZihao HuangYifeng LiuWenhao WuJin ZhangChenjun XiaoChangjiu JiangChonghua LiaoChuning TangFengxiang TangHaiqing GuoHaotian ZhaoHaoze LiHaozhen YuJia ChenJianhang GuoJunyan WuLidong ShiLing YeNeo ZhangNingchen MaQiwei PanQucheng GongShengling MaShupeng WeiSiying HuangWeihao GaoWenyang HeXianghui WeiXianqing JiaZhaoji WangZhen ZhuZhexu WangZiyao XuZonghan Yang