SWE Atlas QnA
Fresh
Codebase QnA is the first benchmark in the SWE-Atlas suite. It evaluates AI agents on deep code comprehension - tracing execution paths, explaining architectural decisions, and answering deeply technical questions about production-grade software systems.
- Type
- RL Env
- Publisher
- General Reasoning
- Runtime
ORS- License
- unknown
- Size
- 124 tasks
- Published
- Apr 2026
Cite
Notes
Only stored in your browser.
Public scores on this env
23 vf-eval reports across 2 models