0

Xbench Scienceqa

A science question answering environment for evaluating scientific reasoning and problem-solving capabilities.

Domain
rl-env
License
unknown
Published
Nov 2025

Cite

Notes

Only stored in your browser.

Attribution

Leaderboard scores
prime-hub
Attribution policy →

Top score 53.3% by Gemini 2.5 Pro - 5 models reporting (2 frontier)

Score history

3
15%36%57%79%100%Jun 25Jul 25Aug 25Gemini 2.5 Pro

Top models

5
Xbench ScienceqaBar chart with 5 bars. Highest value: Gemini 2.5 Pro at 53.3.
5 models

Related tools

1
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

FAQ

What is Xbench Scienceqa?
A science question answering environment for evaluating scientific reasoning and problem-solving capabilities.
What is the current top score on Xbench Scienceqa?
The top reported score is 53.3% by Gemini 2.5 Pro, across 5 models reporting (2 from frontier labs).
How can a model improve its Xbench Scienceqa score?
Tools linked to Xbench Scienceqa on Sophon include Xbench Scienceqa RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.
What license is Xbench Scienceqa under?
Xbench Scienceqa is available under unknown.