HELM (Holistic Evaluation of Language Models)

Stanford CRFM's open-source Python framework for holistic, reproducible, transparent evaluation of foundation models across many benchmarks and metric axes.

Open

Type: Framework
Publisher: Stanford Center for Research on Foundation Models (CRFM)
Tags: Reproducible Evaluation
Runtime: custom
License: Apache-2.0
Size: 50+ scenarios, 100+ supported models
Published: Nov 2021
Canonical: github.com/stanford-crfm/helm

Cite

Notes

Only stored in your browser.

Papers

introducesHolistic Evaluation of Language Models Holistic Evaluation of Language Models