DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Introduces DeepSeek-R1 and R1-Zero, open-weight reasoning models trained primarily via large-scale RL with verifiable rewards (GRPO), matching o1 on math and code at a fraction of the cost.
- Publisher
- DeepSeek
- Year
- 2025
- Venue
- preprint
- Authors
- 199
- Hosting
- External sourcelicense unknown
Cite
Notes
Only stored in your browser.
Introduces 2 artifacts - 2 models
TL;DR
Semantic Scholar
A new artificial intelligence model, DeepSeek-R1, is introduced, demonstrating that the reasoning abilities of large language models can be incentivized through pure reinforcement learning, removing the need for human-annotated demonstrations.
Artifacts
2Models
Authors
199DeepSeek AI TeamKexin HuangXin LiuWentao ZhangHui LiYi YuJin ChenXinyu YangChengqi DengJiawei WangDeepSeek-AIAixin LiuBei FengBing XueBingxuan WangBochao WuChengda LuChenggang ZhaoChenyu ZhangChong RuanDamai DaiDaya GuoDejian YangDeli ChenDongjie JiErhang LiFangyun LinFucong DaiFuli LuoGuangbo HaoGuanting ChenGuowei LiH. ZhangHan BaoHanwei XuHaocheng WangHaowei ZhangHonghui DingHuajian XinHuazuo GaoHui QuJ. L. CaiJian LiangJianZhong GuoJiaqi NiJiashi LiJingchang ChenJingyang YuanJunjie QiuJunlong LiJunxiao SongKai DongKai HuKaige GaoKang GuanKuai YuLean WangLecong ZhangLei XuLeyi XiaLiang ZhaoLitong WangLiyue ZhangMeng LiMiaojun WangMingchuan ZhangMinghua ZhangMinghui TangMingming LiNing TianPanpan HuangPeiyi WangPeng ZhangQiancheng WangQihao ZhuQinyu ChenQiushi DuR. J. ChenR. L. JinRuiqi GeRuisong ZhangRuizhe PanRunji WangRunxin XuRuoyu ZhangRuyi ChenS. S. LiShanghao LuShangyan ZhouShanhuang ChenShaoqing WuShengfeng YeShirong MaShiyu WangShuang ZhouShuiping YuShunfeng ZhouShuting PanT. WangTao YunTian PeiTianyu SunW. L. XiaoWangding ZengWanjia ZhaoWei AnWen LiuWenfeng LiangWenjun GaoWenqin YuX. Q. LiXiangyue JinXianzu WangXiao BiXiaodong LiuXiaohan WangXiaojin ShenXiaokang ChenXiaokang ZhangXiaosha ChenXiaotao NieXiaowen SunXiaoxiang WangXin ChengXin XieXingchao LiuXingkai YuXinnan SongXinxia ShanXinyi ZhouXinyuan LiXuecheng SuXuheng LinY. K. LiY. Q. WangY. X. WeiY. X. ZhuYang ZhangYanhong XuYanping HuangYao LiYao ZhaoYaofeng SunYaohui LiYaohui WangYi ZhengYichao ZhangYifan ShiYiliang XiongYing HeYing TangYishi PiaoYisong WangYixuan TanYiyang MaYiyuan LiuYongqiang GuoYu WuYuan OuYuchen ZhuYuduan WangYue GongYuheng ZouYujia HeYukun ZhaYunfan XiongYunxian MaYuting YanYuxiang LuoYuxiang YouYuxuan LiuYuyang ZhouZ. F. WuZ. Z. RenZehui RenZhangli ShaZhe FuZhean XuZhen HuangZhen ZhangZhenda XieZhengyan ZhangZhewen HaoZhibin GouZhicheng MaZhigang YanZhihong ShaoZhipeng XuZhiyu WuZhongyu ZhangZhuoshu LiZihui GuZijia ZhuZijun LiuZilin LiZiwei XieZiyang SongZiyi GaoZizheng Pan