Dan Hendrycks
Director of the Center for AI Safety; lead author of MMLU and many of the field's most-used safety/capability benchmarks.
- Role
- director
- Currently at
- Center for AI Safety
- twitter.com/DanHendrycks
- GitHub
- github.com/hendrycks
- Scholar
- scholar.google.com/citations
- Papers
- 27
Cite
Notes
Only stored in your browser.
Authored papers
27Humanity's Last Exam
preprint
The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems
arXiv 2025
TextQuests: How Good are LLMs at Text-Based Video Games?
arXiv 2025
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
arXiv 2024
Improving Alignment and Robustness with Circuit Breakers
arXiv 2024
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
arXiv 2024
Tamper-Resistant Safeguards for Open-Weight LLMs
arXiv 2024
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
arXiv 2024
Representation Engineering: A Top-Down Approach to AI Transparency
arXiv 2023
Can LLMs Follow Simple Rules?
arXiv 2023
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
arXiv 2023
MAUD: An Expert-Annotated Legal NLP Dataset for Merger Agreement Understanding
arXiv 2023
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
TMLR
Forecasting Future World Events with Neural Networks
arXiv 2022
Measuring Mathematical Problem Solving With the MATH Dataset
NeurIPS
Measuring Coding Challenge Competence With APPS
arXiv 2021
CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review
arXiv 2021
Measuring Massive Multitask Language Understanding
ICLR
The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization
ICCV 2021 10
Aligning AI With Shared Human Values
arXiv 2020
Natural Adversarial Examples
CVPR 2021 1
Benchmarking Neural Network Robustness to Common Corruptions and Perturbations
benchmarking-neural-network-robustness-to-1
AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty
ICLR 2020 1
Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty
using-self-supervised-learning-can-improve-1
Deep Anomaly Detection with Outlier Exposure
deep-anomaly-detection-with-outlier-exposure-1
A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks
arXiv 2016
Gaussian Error Linear Units (GELUs)
arXiv 2016
Eval contributions
5Affiliations
Frequent co-authors
10from 27 papers
Dawn Song
professor
Mantas Mazeika
researcher
Andy Zou
founder
Steven Basart
researcher
Jacob Steinhardt
founder
Long Phan
researcher
Collin Burns
researcher
Nathaniel Li
grad-student
Norman Mu
Saurav Kadavath
researcher