code editing
- Slug
code-editing- Evals
- 7
- Tools
- 17
- Models
- 70
- Papers
- 6
Evals testing this capability
7Tools lifting evals here
17Top models on this capability
70by avg parsed score across evals here
Papers in this area
6introducesAider's Polyglot Coding BenchmarkintroducesSWE-bench: Can Language Models Resolve Real-World GitHub Issues?introducesSWE-Gym: An Open Environment for Training Software Engineering Agents and VerifiersintroducesSWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?introducesTerminal-Bench: A Benchmark for Real-World Terminal-Based AgentsIntroducing Terminal-Bench

