Cite
Notes
Only stored in your browser.
Attribution
SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?
arXiv 2025
Welfare Diplomacy: Benchmarking Language Model Cooperation
arXiv 2023
from 2 papers
Alan Chan
Andrew Park
Bing Liu
Brad Kenstler
Charles Ide
Chen Bo Calvin Zhang
Chetan Rane
Edwin Pan
Gabriel Mukobi
Hannah Erlebach