0

SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?

OpenAI benchmark of 1,488 real Upwork freelance software-engineering tasks worth $1M in total payouts, including both IC tasks and SWE-Manager decision tasks.

Publisher
OpenAI
Year
2025
Venue
preprint
Authors
4
Hosting
External sourcelicense unknown

Cite

Notes

Only stored in your browser.

Introduces 1 artifact - 1 eval

TL;DR

Semantic Scholar

SWE-Lancer is introduced, a benchmark of over 1,400 freelance software engineering tasks from Upwork valued at \$1 million USD total in real-world payouts, and it is found that frontier models are still unable to solve the majority of tasks.

Artifacts

1

Authors

4