Question 1

What is HumanEval+?

Accepted Answer

Extended HumanEval with 80× more test cases (EvalPlus). Catches more edge-case bugs than the original.

Question 2

What is the current top score on HumanEval+?

Accepted Answer

The top reported score is 87.2% by Qwen2.5 Coder 32B Instruct, across 17 models reporting (1 from frontier labs).

Question 3

How can a model improve its HumanEval+ score?

Accepted Answer

Tools linked to HumanEval+ on Sophon include Humanevalplus RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.

HumanEval+

Score history