0

HELM (Holistic Evaluation of Language Models)

Active

Stanford CRFM's wide-coverage evaluation framework - dozens of scenarios scored on accuracy, calibration, robustness, fairness, bias, toxicity, and efficiency.

Format
Custom
License
Apache-2.0
Published
Nov 2022
Notable for
Benchmark for evaluating factual recall, safety and instruction following.

Cite

Notes

Only stored in your browser.

Papers

2

FAQ

What is HELM (Holistic Evaluation of Language Models)?
Stanford CRFM's wide-coverage evaluation framework - dozens of scenarios scored on accuracy, calibration, robustness, fairness, bias, toxicity, and efficiency.
What capabilities does HELM (Holistic Evaluation of Language Models) test?
HELM (Holistic Evaluation of Language Models) evaluates factual recall, safety, instruction following, hallucination.
What license is HELM (Holistic Evaluation of Language Models) under?
HELM (Holistic Evaluation of Language Models) is available under Apache-2.0.