HealthBench

Fresh

HealthBench tests how well AI models perform in realistic health scenarios, based on what physician experts say matters most.

Type: RL Env
Publisher: General Reasoning
Tags: Healthcare Conversations
Runtime: ORS
License: unknown
Size: 5000 tasks
Published: Jan 2026
Canonical: openreward.ai/GeneralReasoning/HealthBench

Cite

Notes

Only stored in your browser.

Attribution

README: openreward.ai/GeneralReasoning/HealthBench
Scores: OpenReward

Attribution policy →

Public scores on this env

1

2 vf-eval reports across 1 model

1gpt-oss-120bOpenAI57.6

Open the scoring view →