0

HealthBench

Fresh

HealthBench tests how well AI models perform in realistic health scenarios, based on what physician experts say matters most.

Type
RL Env
Runtime
ORS
License
unknown
Size
5000 tasks
Published
Jan 2026

Cite

Notes

Only stored in your browser.

Public scores on this env

1

2 vf-eval reports across 1 model

Open the scoring view →