Andy Arditi

Papers: 5

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: Semantic Scholar

Attribution policy →

5papers

Authored papers

Persona Vectors: Monitoring and Controlling Character Traits in Language Models

arXiv 2025

2025

Adversarial Manipulation of Reasoning Models using Internal Representations

arXiv 2025

2025

Inverse Scaling in Test-Time Compute

arXiv 2025

2025

Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs

arXiv 2025

2025

Refusal in Language Models Is Mediated by a Single Direction

arXiv 2024

2024

Affiliations

No known affiliations.

Frequent co-authors

from 5 papers

Henry Sleight

2 shared papers

Owain Evans

founder

Runjin Chen

Aaquib Syed

Alexander Hägele

Anna Sztyber-Betley

Aryo Pradipta Gema

Beatrice Alex

Benjamin Etheridge

Daniel Paleka