Cite
Notes
Only stored in your browser.
Attribution
Censored LLMs as a Natural Testbed for Secret Knowledge Elicitation
arXiv 2026
Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning
arXiv 2025
from 2 papers
Neel Nanda
researcher
Samuel Marks
Adam Karvonen
Arya Jakkli
Bartosz Cywiński
Caden Juang
Khoi Tran
Senthooran Rajamanoharan