0

Holistic Evaluation of Language Models

Introduces HELM, a framework that evaluates LLMs across 16 scenarios and 7 metrics (accuracy, calibration, robustness, fairness, bias, toxicity, efficiency) instead of a single number.

Year
2022
Venue
TMLR
Authors
5
Hosting
External sourcelicense unknown

Cite

Notes

Only stored in your browser.

Introduces 2 artifacts - 1 eval, 1 tool

TL;DR

Semantic Scholar

Holistic Evaluation of Language Models (HELM) is presented to improve the transparency of language models and intends for HELM to be a living benchmark for the community, continuously updated with new scenarios, metrics, and models.

Artifacts

2

Authors

5