What capabilities does HELM (Holistic Evaluation of Language Models) test?

HELM (Holistic Evaluation of Language Models) evaluates factual recall, safety, instruction following, hallucination.

What license is HELM (Holistic Evaluation of Language Models) under?

HELM (Holistic Evaluation of Language Models) is available under Apache-2.0.

HELM (Holistic Evaluation of Language Models)

Active

Stanford CRFM's wide-coverage evaluation framework - dozens of scenarios scored on accuracy, calibration, robustness, fairness, bias, toxicity, and efficiency.

Open

Publisher: Stanford Center for Research on Foundation Models (CRFM)
Capabilities: Factual Recall Safety Instruction Following Hallucination
Format: Custom
License: Apache-2.0
Published: Nov 2022
Notable for: Benchmark for evaluating factual recall, safety and instruction following.
Canonical: crfm.stanford.edu/helm
Also on: github.com/stanford-crfm/helm

Cite

Notes

Only stored in your browser.

Papers

Holistic Evaluation of Language Models

TMLR · 2022

Introduces HELM, a framework that evaluates LLMs across 16 scenarios and 7 metrics (accuracy, calibration, robustness, fairness, bias, toxicity, efficiency) instead of a single number.

introduces

Holistic Evaluation of Language Models

TMLR · 2022

Introduces HELM, a framework that evaluates LLMs across 16 scenarios and 7 metrics (accuracy, calibration, robustness, fairness, bias, toxicity, efficiency) instead of a single number.

FAQ

What is HELM (Holistic Evaluation of Language Models)?: Stanford CRFM's wide-coverage evaluation framework - dozens of scenarios scored on accuracy, calibration, robustness, fairness, bias, toxicity, and efficiency.
What capabilities does HELM (Holistic Evaluation of Language Models) test?: HELM (Holistic Evaluation of Language Models) evaluates factual recall, safety, instruction following, hallucination.
What license is HELM (Holistic Evaluation of Language Models) under?: HELM (Holistic Evaluation of Language Models) is available under Apache-2.0.