Nathaniel Li
UC Berkeley / CAIS researcher; first-author on WMDP benchmark; co-author on Humanity's Last Exam.
- Role
- grad-student
- Currently at
- University of California, Berkeley
- twitter.com/nathanielwli
- Scholar
- scholar.google.com/citations
- Papers
- 4
Cite
Notes
Only stored in your browser.
4papers
Authored papers
4Humanity's Last Exam
preprint
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
arXiv 2024
Representation Engineering: A Top-Down Approach to AI Transparency
arXiv 2023
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
arXiv 2023
Affiliations
Previously
Frequent co-authors
10from 4 papers
Dan Hendrycks
director
Andy Zou
founder
Long Phan
researcher
Steven Basart
researcher
Alexander Pan
Mantas Mazeika
researcher
Xuwang Yin
Zifan Wang
Alex Mallen
Alexandr Wang
CEO