0

DeepDive

Fresh

Prime Intellect's multi-turn open-web research RL environment - model uses a Serper search tool to answer BrowseComp/SimpleQA-style hop-heavy questions; reward = answer-correctness from a judge.

Type
RL Env
Runtime
verifiers
License
MIT
Size
1 env, BrowseComp + SimpleQA derived question pool
Published
May 2026

Cite

Notes

Only stored in your browser.