Henry Sleight

Papers: 7

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: Semantic Scholar

Attribution policy →

7papers

Authored papers

Persona Vectors: Monitoring and Controlling Character Traits in Language Models

arXiv 2025

2025

Beyond Data Filtering: Knowledge Localization for Capability Removal in LLMs

arXiv 2025

2025

Inverse Scaling in Test-Time Compute

arXiv 2025

2025

Looking Inward: Language Models Can Learn About Themselves by Introspection

arXiv 2024

2024

Best-of-N Jailbreaking

arXiv 2024

2024

Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs

arXiv 2024

2024

Rapid Response: Mitigating LLM Jailbreaks with a Few Examples

arXiv 2024

2024

Affiliations

No known affiliations.

Frequent co-authors

from 7 papers

Ethan Perez

Aengus Lynch

Andy Arditi

Aryo Pradipta Gema

Erik Jones

Jacob Goldman-Wetzler

2 shared papers

John Hughes

2 shared papers

Julian Michael

researcher

2 shared papers

Mrinank Sharma

2 shared papers

Owain Evans

founder

2 shared papers