Prime is a team.
Cite
Notes
Only stored in your browser.
Sandboxed Python-REPL harness for training models to manage their own context window across turns.
Multimodal aim training environment where agents click targets in images. Demonstrates visual reasoning with coordinate-based responses.
Just another swe grep environment
WebVoyager browser benchmark with filtered dataset (600 tasks from sites without anti-bot protection)
BrowserEnv demo for web browsing tasks using Browserbase
GSM8K environment
Reverse text character by character.
Multi-turn visual click calibration tasks with click and computer tool formats across pixel and normalized coordinate schemas.
Backdoor-ifeval env for inoculation experiments (pre-no-v version)
Backdoor-ifeval env with group-level reward monitors for within-batch advantage variance
Unified backdoor-ifeval env: difficulty, aggregation, no-v check, inoculation, group monitors
Reward hacking with deterministic IF constraints
V1 Taskset/Harness environment training LangChain deep-agents on Wikispeedia navigation
Chess environment where an agent plays as White against configurable opponents (random, LLM, or Stockfish)
Cross-repo code-search tasks over prime-rl, verifiers, vllm, pytorch
Stateful tool-based environment for constrained meeting scheduling
OpenRCA root cause analysis benchmark environment for Verifiers (ICLR 2025)
τ²-bench with custom synthetic domains (library, fitness_gym, tech_support, telecom, cloud_incident_response, daily_planner, ev_charging_support)