Question 1

What is MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering?

Accepted Answer

Machine learning tasks drawn from 75 Kaggle competitions.

Question 2

What is the current top score on MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering?

Accepted Answer

The top reported score is 20.0% by GPT-4.1 Mini, across 1 model reporting (1 from frontier labs).

Question 3

How can a model improve its MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering score?

Accepted Answer

Tools linked to MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering on Sophon include Mlebench RL Env (Community), MLE Bench RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.

Question 4

What license is MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering under?

Accepted Answer

MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering is available under mit.

MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering

Top models

Related tools

Mlebench RL Env (Community)

MLE Bench RL Env (Community)

FAQ