HealthBench
HealthBench tests how well AI models perform in realistic health scenarios, based on what physician experts say matters most.
- Domain
- rl-env
- License
- unknown
- Published
- Jan 2026
Cite
Notes
Only stored in your browser.
Top score 57.6 by gpt-oss-120b - 1 model reporting (1 frontier)
Top models
1FAQ
- What is HealthBench?
- HealthBench tests how well AI models perform in realistic health scenarios, based on what physician experts say matters most.
- What is the current top score on HealthBench?
- The top reported score is 57.6 by gpt-oss-120b, across 1 model reporting (1 from frontier labs).
- What license is HealthBench under?
- HealthBench is available under unknown.