Mantas Mazeika
Center for AI Safety researcher; lead author of HarmBench and contributor to MMLU and frontier-risk evaluations.
- Role
- researcher
- Currently at
- Center for AI Safety
- twitter.com/mmazeika
- GitHub
- github.com/mmazeika
- Scholar
- scholar.google.com/citations
- Papers
- 12
Cite
Notes
Only stored in your browser.
Authored papers
12The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems
arXiv 2025
TextQuests: How Good are LLMs at Text-Based Video Games?
arXiv 2025
MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes
arXiv 2025
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
arXiv 2024
Tamper-Resistant Safeguards for Open-Weight LLMs
arXiv 2024
Representation Engineering: A Top-Down Approach to AI Transparency
arXiv 2023
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
TMLR
Forecasting Future World Events with Neural Networks
arXiv 2022
Measuring Coding Challenge Competence With APPS
arXiv 2021
Measuring Massive Multitask Language Understanding
ICLR
Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty
using-self-supervised-learning-can-improve-1
Deep Anomaly Detection with Outlier Exposure
deep-anomaly-detection-with-outlier-exposure-1
Affiliations
Frequent co-authors
10from 12 papers
Dan Hendrycks
director
Andy Zou
founder
Dawn Song
professor
Long Phan
researcher
Steven Basart
researcher
Jacob Steinhardt
founder
Xuwang Yin
Alice Gatti
researcher
Bo Li
Collin Burns
researcher