0

HealthBench: Evaluating Large Language Models Towards Improved Human Health

Active

A comprehensive evaluation benchmark designed to assess language models' medical capabilities across a wide range of healthcare scenarios.

Publisher
OpenAI
Domain
Knowledge
License
mit
Published
Jul 2025
Notable for
Benchmark for evaluating Knowledge.

Cite

Notes

Only stored in your browser.

Related tools

2
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

FAQ

What is HealthBench: Evaluating Large Language Models Towards Improved Human Health?
A comprehensive evaluation benchmark designed to assess language models' medical capabilities across a wide range of healthcare scenarios.
How can a model improve its HealthBench: Evaluating Large Language Models Towards Improved Human Health score?
Tools linked to HealthBench: Evaluating Large Language Models Towards Improved Human Health on Sophon include Healthbench RL Env (Medarc), VF Openbench RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.
What license is HealthBench: Evaluating Large Language Models Towards Improved Human Health under?
HealthBench: Evaluating Large Language Models Towards Improved Human Health is available under mit.