Dan Hendrycks

Director of the Center for AI Safety; lead author of MMLU and many of the field's most-used safety/capability benchmarks.

Role: director
Currently at: Center for AI Safety
Twitter: twitter.com/DanHendrycks
GitHub: github.com/hendrycks
Scholar: scholar.google.com/citations
Papers: 27

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: scholar.google.com/citations

Attribution policy →

27papers·5eval contribs

Authored papers

27

Humanity's Last Exam

preprint

The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems

arXiv 2025

TextQuests: How Good are LLMs at Text-Based Video Games?

arXiv 2025

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

arXiv 2024

Improving Alignment and Robustness with Circuit Breakers

arXiv 2024

Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

arXiv 2024

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

arXiv 2024

Tamper-Resistant Safeguards for Open-Weight LLMs

arXiv 2024

Representation Engineering: A Top-Down Approach to AI Transparency

arXiv 2023

Can LLMs Follow Simple Rules?

arXiv 2023

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark

arXiv 2023

MAUD: An Expert-Annotated Legal NLP Dataset for Merger Agreement Understanding

arXiv 2023

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

TMLR

Forecasting Future World Events with Neural Networks

arXiv 2022

Measuring Mathematical Problem Solving With the MATH Dataset

NeurIPS

Measuring Coding Challenge Competence With APPS

arXiv 2021

CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review

arXiv 2021

Measuring Massive Multitask Language Understanding

ICLR

The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization

ICCV 2021 10

Aligning AI With Shared Human Values

arXiv 2020

Natural Adversarial Examples

CVPR 2021 1

Benchmarking Neural Network Robustness to Common Corruptions and Perturbations

benchmarking-neural-network-robustness-to-1

AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty

ICLR 2020 1

Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty

using-self-supervised-learning-can-improve-1

Deep Anomaly Detection with Outlier Exposure

deep-anomaly-detection-with-outlier-exposure-1

A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks

arXiv 2016

Gaussian Error Linear Units (GELUs)

arXiv 2016

Eval contributions

5

HarmBench

University of California, Berkeley

Standardized red-teaming evaluation of 400 harmful behaviors across 18 attacks, scored by a fine-tuned classifier for attack success rate.

ActiveSafetyJailbreak Resistance

Humanity's Last Exam (HLE)

Center for AI Safety (CAIS)

2,500 expert-authored questions across math, sciences, and humanities designed to be the hardest closed-ended benchmark for frontier models.

ActiveScientific ReasoningMathFactual Recall

MATH-500

OpenAI

500-problem subset of the Hendrycks MATH competition-math benchmark, popularized by OpenAI's PRM800K work as a standard evaluation slice.

SaturatedMathPlanning

MATH

University of California, Berkeley

12,500 high-school competition math problems with full LaTeX-formatted step-by-step solutions, spanning algebra through number theory.

Massive Multitask Language Understanding (MMLU)

University of California, Berkeley

57-subject multiple-choice exam testing broad world knowledge and reasoning across academic and professional domains.

SaturatedFactual RecallScientific Reasoning

Affiliations

Currently at

Center for AI Safety

director · non profit

Previously

University of California, Berkeleyuniversity lab

Frequent co-authors

10

from 27 papers

Dawn Song

professor

11 shared papers

Mantas Mazeika

researcher

11 shared papers

Andy Zou

founder

10 shared papers

Steven Basart

researcher

9 shared papers

Jacob Steinhardt

founder

7 shared papers

Long Phan

researcher

6 shared papers

Collin Burns

researcher

5 shared papers

Nathaniel Li

grad-student

4 shared papers

Norman Mu

4 shared papers

Saurav Kadavath

researcher

4 shared papers