Henry Sleight
- Papers
- 7
Cite
Notes
Only stored in your browser.
7papers
Authored papers
7Persona Vectors: Monitoring and Controlling Character Traits in Language Models
arXiv 2025
Beyond Data Filtering: Knowledge Localization for Capability Removal in LLMs
arXiv 2025
Inverse Scaling in Test-Time Compute
arXiv 2025
Best-of-N Jailbreaking
arXiv 2024
Looking Inward: Language Models Can Learn About Themselves by Introspection
arXiv 2024
Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs
arXiv 2024
Rapid Response: Mitigating LLM Jailbreaks with a Few Examples
arXiv 2024
Affiliations
No known affiliations.
Frequent co-authors
10from 7 papers
Ethan Perez
Aengus Lynch
Andy Arditi
Aryo Pradipta Gema
Erik Jones
Jacob Goldman-Wetzler
John Hughes
Julian Michael
researcher
Mrinank Sharma
Owain Evans
founder