code generation
- Slug
code-generation- Evals
- 10
- Tools
- 19
- Models
- 446
- Papers
- 8
Evals testing this capability
10Tools lifting evals here
19Top models on this capability
446by avg parsed score across evals here
Papers in this area
8introducesAider's Polyglot Coding BenchmarkintroducesMeasuring AI Ability to Complete Long TasksintroducesEvaluating Large Language Models Trained on Code (pass@k formulation)Evaluating Large Language Models Trained on CodeEvaluating Large Language Models Trained on CodeintroducesLiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for CodeintroducesProgram Synthesis with Large Language ModelsintroducesSWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?
