0

MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering

Active

Machine learning tasks drawn from 75 Kaggle competitions.

Publisher
OpenAI
Domain
Coding
License
mit
Published
Feb 2025
Notable for
Benchmark for evaluating Coding.

Cite

Notes

Only stored in your browser.

Top score 20.0% by GPT-4.1 Mini - 1 model reporting (1 frontier)

Top models

1
MLE-bench: Evaluating Machine Learning Agents on Machine Learning EngineeringBar chart with 1 bar. Highest value: GPT-4.1 Mini at 20.
1 model

Related tools

2
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

FAQ

What is MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering?
Machine learning tasks drawn from 75 Kaggle competitions.
What is the current top score on MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering?
The top reported score is 20.0% by GPT-4.1 Mini, across 1 model reporting (1 from frontier labs).
How can a model improve its MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering score?
Tools linked to MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering on Sophon include Mlebench RL Env (Community), MLE Bench RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.
What license is MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering under?
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering is available under mit.