Steven Basart
Researcher at Center for AI Safety; co-author of MMLU, MATH, and multiple safety benchmarks; longtime collaborator of Dan Hendrycks.
- Role
- researcher
- Currently at
- Center for AI Safety
- twitter.com/xksteven
- GitHub
- github.com/xksteven
- Scholar
- scholar.google.com/citations
- Papers
- 9
Cite
Notes
Only stored in your browser.
Authored papers
9HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
arXiv 2024
Representation Engineering: A Top-Down Approach to AI Transparency
arXiv 2023
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
arXiv 2023
Measuring Mathematical Problem Solving With the MATH Dataset
NeurIPS
Measuring Coding Challenge Competence With APPS
arXiv 2021
Measuring Massive Multitask Language Understanding
ICLR
The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization
ICCV 2021 10
Aligning AI With Shared Human Values
arXiv 2020
Natural Adversarial Examples
CVPR 2021 1
Affiliations
Frequent co-authors
10from 9 papers
Dan Hendrycks
director
Dawn Song
professor
Jacob Steinhardt
founder
Andy Zou
founder
Collin Burns
researcher
Mantas Mazeika
researcher
Nathaniel Li
grad-student
Saurav Kadavath
researcher
Akul Arora
researcher
Alexander Pan