Rowan Wang

Cite

Notes

Only stored in your browser.

Attribution

3papers

Authored papers

Eliciting Secret Knowledge from Language Models

arXiv 2025

Improving Alignment and Robustness with Circuit Breakers

arXiv 2024

Tamper-Resistant Safeguards for Open-Weight LLMs

arXiv 2024

No known affiliations.

from 3 papers

Andy Zou

founder

Dan Hendrycks

director

Justin Wang

Long Phan

researcher

Maxwell Lin

Alice Gatti

researcher

Andy Zhou

Arthur Conmy

Bartosz Cywiński

Bhrugu Bharathi