Holistic Evaluation of Language Models
Introduces HELM, a framework that evaluates LLMs across 16 scenarios and 7 metrics (accuracy, calibration, robustness, fairness, bias, toxicity, efficiency) instead of a single number.
- Year
- 2022
- Venue
- TMLR
- Authors
- 5
- Hosting
- External sourcelicense unknown
Cite
Notes
Only stored in your browser.
Introduces 2 artifacts - 1 eval, 1 tool
TL;DR
Semantic Scholar
Holistic Evaluation of Language Models (HELM) is presented to improve the transparency of language models and intends for HELM to be a living benchmark for the community, continuously updated with new scenarios, metrics, and models.