Dan Hendrycks
Director of the Center for AI Safety; lead author of MMLU and many of the field's most-used safety/capability benchmarks.
- Role
- director
- Currently at
- Center for AI Safety
- twitter.com/DanHendrycks
- GitHub
- github.com/hendrycks
- Scholar
- scholar.google.com/citations
- Papers
- 27
Cite
Notes
Only stored in your browser.
Authored papers
27Humanity's Last Exam
preprint
The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems
arXiv 2025
TextQuests: How Good are LLMs at Text-Based Video Games?
arXiv 2025
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
arXiv 2024
Improving Alignment and Robustness with Circuit Breakers
arXiv 2024
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
arXiv 2024
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
arXiv 2024
Tamper-Resistant Safeguards for Open-Weight LLMs
arXiv 2024
Representation Engineering: A Top-Down Approach to AI Transparency
arXiv 2023
Can LLMs Follow Simple Rules?
arXiv 2023
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
arXiv 2023
MAUD: An Expert-Annotated Legal NLP Dataset for Merger Agreement Understanding
arXiv 2023
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
TMLR
Forecasting Future World Events with Neural Networks
arXiv 2022
Measuring Mathematical Problem Solving With the MATH Dataset
NeurIPS
CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review
arXiv 2021
Measuring Coding Challenge Competence With APPS
arXiv 2021
Measuring Massive Multitask Language Understanding
ICLR
The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization
ICCV 2021 10
Aligning AI With Shared Human Values
arXiv 2020
Natural Adversarial Examples
CVPR 2021 1
Benchmarking Neural Network Robustness to Common Corruptions and Perturbations
benchmarking-neural-network-robustness-to-1
AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty
ICLR 2020 1
Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty
using-self-supervised-learning-can-improve-1
Deep Anomaly Detection with Outlier Exposure
deep-anomaly-detection-with-outlier-exposure-1
A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks
arXiv 2016
Gaussian Error Linear Units (GELUs)
arXiv 2016
Eval contributions
5Affiliations
Frequent co-authors
10from 27 papers
Dawn Song
professor
Mantas Mazeika
researcher
Andy Zou
founder
Steven Basart
researcher
Jacob Steinhardt
founder
Long Phan
researcher
Collin Burns
researcher
Nathaniel Li
grad-student
Norman Mu
Saurav Kadavath
researcher