HELM (Holistic Evaluation of Language Models)
Stanford Center for Research on Foundation Models (CRFM)
Stanford CRFM's wide-coverage evaluation framework - dozens of scenarios scored on accuracy, calibration, robustness, fairness, bias, toxicity, and efficiency.
ActiveFactual RecallSafetyInstruction Following