DeepDive
Fresh
Prime Intellect's multi-turn open-web research RL environment - model uses a Serper search tool to answer BrowseComp/SimpleQA-style hop-heavy questions; reward = answer-correctness from a judge.
- Type
- RL Env
- Publisher
- Prime Intellect
- Capabilities
- PlanningTool CallingBrowser UseRetrieval
- Runtime
verifiers- License
- MIT
- Size
- 1 env, BrowseComp + SimpleQA derived question pool
- Published
- May 2026
Cite
Notes
Only stored in your browser.