HELM (Holistic Evaluation of Language Models)
Active
Stanford CRFM's wide-coverage evaluation framework - dozens of scenarios scored on accuracy, calibration, robustness, fairness, bias, toxicity, and efficiency.
- Format
- Custom
- License
- Apache-2.0
- Published
- Nov 2022
- Notable for
- Benchmark for evaluating factual recall, safety and instruction following.
- Canonical
- crfm.stanford.edu/helm
Cite
Notes
Only stored in your browser.
Papers
2FAQ
- What is HELM (Holistic Evaluation of Language Models)?
- Stanford CRFM's wide-coverage evaluation framework - dozens of scenarios scored on accuracy, calibration, robustness, fairness, bias, toxicity, and efficiency.
- What capabilities does HELM (Holistic Evaluation of Language Models) test?
- HELM (Holistic Evaluation of Language Models) evaluates factual recall, safety, instruction following, hallucination.
- What license is HELM (Holistic Evaluation of Language Models) under?
- HELM (Holistic Evaluation of Language Models) is available under Apache-2.0.