0

HealthBench

HealthBench tests how well AI models perform in realistic health scenarios, based on what physician experts say matters most.

Domain
rl-env
License
unknown
Published
Jan 2026

Cite

Notes

Only stored in your browser.

Attribution

Leaderboard scores
OpenReward
Attribution policy →

Top score 57.6 by gpt-oss-120b - 1 model reporting (1 frontier)

Top models

1
HealthBenchBar chart with 1 bar. Highest value: gpt-oss-120b at 57.6.
1 model

FAQ

What is HealthBench?
HealthBench tests how well AI models perform in realistic health scenarios, based on what physician experts say matters most.
What is the current top score on HealthBench?
The top reported score is 57.6 by gpt-oss-120b, across 1 model reporting (1 from frontier labs).
What license is HealthBench under?
HealthBench is available under unknown.