0

Dan Hendrycks

Director of the Center for AI Safety; lead author of MMLU and many of the field's most-used safety/capability benchmarks.

Role
director
Papers
27

Cite

Notes

Only stored in your browser.

27papers·5eval contribs

Authored papers

27

Humanity's Last Exam

preprint

2025

The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems

arXiv 2025

2025

TextQuests: How Good are LLMs at Text-Based Video Games?

arXiv 2025

2025

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

arXiv 2024

2024

Improving Alignment and Robustness with Circuit Breakers

arXiv 2024

2024

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

arXiv 2024

2024

Tamper-Resistant Safeguards for Open-Weight LLMs

arXiv 2024

2024

Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

arXiv 2024

2024

Representation Engineering: A Top-Down Approach to AI Transparency

arXiv 2023

2023

Can LLMs Follow Simple Rules?

arXiv 2023

2023

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark

arXiv 2023

2023

MAUD: An Expert-Annotated Legal NLP Dataset for Merger Agreement Understanding

arXiv 2023

2023

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

TMLR

2022

Forecasting Future World Events with Neural Networks

arXiv 2022

2022

Measuring Mathematical Problem Solving With the MATH Dataset

NeurIPS

2021

Measuring Coding Challenge Competence With APPS

arXiv 2021

2021

CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review

arXiv 2021

2021

Measuring Massive Multitask Language Understanding

ICLR

2020

The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization

ICCV 2021 10

2020

Aligning AI With Shared Human Values

arXiv 2020

2020

Natural Adversarial Examples

CVPR 2021 1

2019

Benchmarking Neural Network Robustness to Common Corruptions and Perturbations

benchmarking-neural-network-robustness-to-1

2019

AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty

ICLR 2020 1

2019

Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty

using-self-supervised-learning-can-improve-1

2019

Deep Anomaly Detection with Outlier Exposure

deep-anomaly-detection-with-outlier-exposure-1

2018

A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks

arXiv 2016

2016

Gaussian Error Linear Units (GELUs)

arXiv 2016

2016

Eval contributions

5

Affiliations

Frequent co-authors

10

from 27 papers