0

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Introduces DeepSeek-R1 and R1-Zero, open-weight reasoning models trained primarily via large-scale RL with verifiable rewards (GRPO), matching o1 on math and code at a fraction of the cost.

Publisher
DeepSeek
Year
2025
Venue
preprint
Authors
199
Hosting
External sourcelicense unknown

Cite

Notes

Only stored in your browser.

Introduces 2 artifacts - 2 models

TL;DR

Semantic Scholar

A new artificial intelligence model, DeepSeek-R1, is introduced, demonstrating that the reasoning abilities of large language models can be incentivized through pure reinforcement learning, removing the need for human-annotated demonstrations.

Artifacts

2

Authors

199
DeepSeek AI TeamKexin HuangXin LiuWentao ZhangHui LiYi YuJin ChenXinyu YangChengqi DengJiawei WangDeepSeek-AIAixin LiuBei FengBing XueBingxuan WangBochao WuChengda LuChenggang ZhaoChenyu ZhangChong RuanDamai DaiDaya GuoDejian YangDeli ChenDongjie JiErhang LiFangyun LinFucong DaiFuli LuoGuangbo HaoGuanting ChenGuowei LiH. ZhangHan BaoHanwei XuHaocheng WangHaowei ZhangHonghui DingHuajian XinHuazuo GaoHui QuJ. L. CaiJian LiangJianZhong GuoJiaqi NiJiashi LiJingchang ChenJingyang YuanJunjie QiuJunlong LiJunxiao SongKai DongKai HuKaige GaoKang GuanKuai YuLean WangLecong ZhangLei XuLeyi XiaLiang ZhaoLitong WangLiyue ZhangMeng LiMiaojun WangMingchuan ZhangMinghua ZhangMinghui TangMingming LiNing TianPanpan HuangPeiyi WangPeng ZhangQiancheng WangQihao ZhuQinyu ChenQiushi DuR. J. ChenR. L. JinRuiqi GeRuisong ZhangRuizhe PanRunji WangRunxin XuRuoyu ZhangRuyi ChenS. S. LiShanghao LuShangyan ZhouShanhuang ChenShaoqing WuShengfeng YeShirong MaShiyu WangShuang ZhouShuiping YuShunfeng ZhouShuting PanT. WangTao YunTian PeiTianyu SunW. L. XiaoWangding ZengWanjia ZhaoWei AnWen LiuWenfeng LiangWenjun GaoWenqin YuX. Q. LiXiangyue JinXianzu WangXiao BiXiaodong LiuXiaohan WangXiaojin ShenXiaokang ChenXiaokang ZhangXiaosha ChenXiaotao NieXiaowen SunXiaoxiang WangXin ChengXin XieXingchao LiuXingkai YuXinnan SongXinxia ShanXinyi ZhouXinyuan LiXuecheng SuXuheng LinY. K. LiY. Q. WangY. X. WeiY. X. ZhuYang ZhangYanhong XuYanping HuangYao LiYao ZhaoYaofeng SunYaohui LiYaohui WangYi ZhengYichao ZhangYifan ShiYiliang XiongYing HeYing TangYishi PiaoYisong WangYixuan TanYiyang MaYiyuan LiuYongqiang GuoYu WuYuan OuYuchen ZhuYuduan WangYue GongYuheng ZouYujia HeYukun ZhaYunfan XiongYunxian MaYuting YanYuxiang LuoYuxiang YouYuxuan LiuYuyang ZhouZ. F. WuZ. Z. RenZehui RenZhangli ShaZhe FuZhean XuZhen HuangZhen ZhangZhenda XieZhengyan ZhangZhewen HaoZhibin GouZhicheng MaZhigang YanZhihong ShaoZhipeng XuZhiyu WuZhongyu ZhangZhuoshu LiZihui GuZijia ZhuZijun LiuZilin LiZiwei XieZiyang SongZiyi GaoZizheng Pan