HumanEval+
Extended HumanEval with 80× more test cases (EvalPlus). Catches more edge-case bugs than the original.
- Publisher
- EvalPlus Team
- Published
- Apr 2023
- Canonical
- github.com/evalplus/evalplus
Cite
Notes
Only stored in your browser.
Top score 87.2% by Qwen2.5 Coder 32B Instruct - 17 models reporting (1 frontier)
Score history
6Top models
17Related tools
1Implementations, trainers, datasets and scaffolds linked to this eval.
FAQ
- What is HumanEval+?
- Extended HumanEval with 80× more test cases (EvalPlus). Catches more edge-case bugs than the original.
- What is the current top score on HumanEval+?
- The top reported score is 87.2% by Qwen2.5 Coder 32B Instruct, across 17 models reporting (1 from frontier labs).
- How can a model improve its HumanEval+ score?
- Tools linked to HumanEval+ on Sophon include Humanevalplus RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.
