0

Bixbench

BixBench scientific reasoning evaluation environment

Domain
rl-env
License
mit
Published
Mar 2026

Cite

Notes

Only stored in your browser.

Attribution

Leaderboard scores
prime-hub
Attribution policy →

Top score 0.0% by GPT-4.1 Mini - 2 models reporting (2 frontier)

Top models

2
BixbenchBar chart with 2 bars. Highest value: DeepSeek Chat at 28.3.
2 models

Related tools

1
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

FAQ

What is Bixbench?
BixBench scientific reasoning evaluation environment
What is the current top score on Bixbench?
The top reported score is 0.0% by GPT-4.1 Mini, across 2 models reporting (2 from frontier labs).
How can a model improve its Bixbench score?
Tools linked to Bixbench on Sophon include Bixbench RL Env (Prime Community) - RL environments, datasets, and scaffolds that target this eval.
What license is Bixbench under?
Bixbench is available under mit.