Mostly Basic Python Problems (MBPP)
Saturated
974 short crowd-sourced Python tasks with three unit tests each, used alongside HumanEval as a baseline code-generation benchmark.
- Publisher
- Google Research
- Capabilities
- Code Generation
- Domain
- code
- Format
- HF Dataset
- Size
- 974 tasks
- License
- CC-BY-4.0
- Published
- Aug 2021
- Notable for
- Benchmark for evaluating code generation in the code domain.
- Also on
Cite
Notes
Only stored in your browser.
Top score 100.0% by GPT-5 Nano - 15 models reporting (3 frontier)
Score history
7Top models
15Where it's ranked
1Related tools
3Implementations, trainers, datasets and scaffolds linked to this eval.
Papers
2Contributors
1FAQ
- What is Mostly Basic Python Problems (MBPP)?
- 974 short crowd-sourced Python tasks with three unit tests each, used alongside HumanEval as a baseline code-generation benchmark.
- What capabilities does Mostly Basic Python Problems (MBPP) test?
- Mostly Basic Python Problems (MBPP) evaluates code generation.
- What is the current top score on Mostly Basic Python Problems (MBPP)?
- The top reported score is 100.0% by GPT-5 Nano, across 15 models reporting (3 from frontier labs).
- How can a model improve its Mostly Basic Python Problems (MBPP) score?
- Tools linked to Mostly Basic Python Problems (MBPP) on Sophon include MBPP RL Env (Community), Openenv Coding RL Env (Meta FAIR (Fundamental AI Research)), MBPP Baseline RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.
- What license is Mostly Basic Python Problems (MBPP) under?
- Mostly Basic Python Problems (MBPP) is available under CC-BY-4.0.