0

SWE Atlas QnA

Codebase QnA is the first benchmark in the SWE-Atlas suite. It evaluates AI agents on deep code comprehension - tracing execution paths, explaining architectural decisions, and answering deeply technical questions about production-grade software systems.

Domain
rl-env
License
unknown
Published
Apr 2026

Cite

Notes

Only stored in your browser.

Attribution

Leaderboard scores
OpenReward
Attribution policy →

Top score 40.8 by GPT-5.4 - 2 models reporting (1 frontier)

Top models

2
SWE Atlas QnABar chart with 2 bars. Highest value: GPT-5.4 at 40.8.
2 models

FAQ

What is SWE Atlas QnA?
Codebase QnA is the first benchmark in the SWE-Atlas suite. It evaluates AI agents on deep code comprehension - tracing execution paths, explaining architectural decisions, and answering deeply technical questions about production-grade software systems.
What is the current top score on SWE Atlas QnA?
The top reported score is 40.8 by GPT-5.4, across 2 models reporting (1 from frontier labs).
What license is SWE Atlas QnA under?
SWE Atlas QnA is available under unknown.