DeepSeek R1-Zero

Open weights

DeepSeek's pure-RL reasoning model - trained directly from DeepSeek-V3-Base via large-scale GRPO with no SFT cold-start; the proof-of-concept paired with R1.

Publisher: DeepSeek
Family: Deepseek R
Params: 671B total / 37B active
Context: 128K
Input
Output
Providers: self-host
License: MIT
Released: Jan 2025
Canonical: huggingface.co/deepseek-ai/DeepSeek-R1-Zero

Cite

Notes

Only stored in your browser.

Attribution

Benchmark scores: OpenReward

Attribution policy →

Context

128K

Reported on 1 eval

Reported eval scores

1

Introduced in

paperDeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning