HealthBench
Fresh
HealthBench tests how well AI models perform in realistic health scenarios, based on what physician experts say matters most.
- Type
- RL Env
- Publisher
- General Reasoning
- Runtime
ORS- License
- unknown
- Size
- 5000 tasks
- Published
- Jan 2026
Cite
Notes
Only stored in your browser.
Public scores on this env
12 vf-eval reports across 1 model