0

SWE Atlas QnA

Fresh

Codebase QnA is the first benchmark in the SWE-Atlas suite. It evaluates AI agents on deep code comprehension - tracing execution paths, explaining architectural decisions, and answering deeply technical questions about production-grade software systems.

Type
RL Env
Runtime
ORS
License
unknown
Size
124 tasks
Published
Apr 2026

Cite

Notes

Only stored in your browser.

Public scores on this env

2

3 vf-eval reports across 2 models

Open the scoring view →