SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?
OpenAI benchmark of 1,488 real Upwork freelance software-engineering tasks worth $1M in total payouts, including both IC tasks and SWE-Manager decision tasks.
- Publisher
- OpenAI
- Year
- 2025
- Venue
- preprint
- Authors
- 4
- Hosting
- External sourcelicense unknown
Cite
Notes
Only stored in your browser.
Introduces 1 artifact - 1 eval
TL;DR
Semantic Scholar
SWE-Lancer is introduced, a benchmark of over 1,400 freelance software engineering tasks from Upwork valued at \$1 million USD total in real-world payouts, and it is found that frontier models are still unable to solve the majority of tasks.
Artifacts
1Evals