HealthBench: Evaluating Large Language Models Towards Improved Human Health
Active
A comprehensive evaluation benchmark designed to assess language models' medical capabilities across a wide range of healthcare scenarios.
- Publisher
- OpenAI
- Domain
- Knowledge
- License
- mit
- Published
- Jul 2025
- Notable for
- Benchmark for evaluating Knowledge.
Cite
Notes
Only stored in your browser.
Related tools
2Implementations, trainers, datasets and scaffolds linked to this eval.
FAQ
- What is HealthBench: Evaluating Large Language Models Towards Improved Human Health?
- A comprehensive evaluation benchmark designed to assess language models' medical capabilities across a wide range of healthcare scenarios.
- How can a model improve its HealthBench: Evaluating Large Language Models Towards Improved Human Health score?
- Tools linked to HealthBench: Evaluating Large Language Models Towards Improved Human Health on Sophon include Healthbench RL Env (Medarc), VF Openbench RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.
- What license is HealthBench: Evaluating Large Language Models Towards Improved Human Health under?
- HealthBench: Evaluating Large Language Models Towards Improved Human Health is available under mit.