Andy Arditi
- Papers
- 5
Cite
Notes
Only stored in your browser.
5papers
Authored papers
5Persona Vectors: Monitoring and Controlling Character Traits in Language Models
arXiv 2025
Adversarial Manipulation of Reasoning Models using Internal Representations
arXiv 2025
Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs
arXiv 2025
Inverse Scaling in Test-Time Compute
arXiv 2025
Refusal in Language Models Is Mediated by a Single Direction
arXiv 2024
Affiliations
No known affiliations.
Frequent co-authors
10from 5 papers