Jacob Hilton
President of the Alignment Research Center; ex-OpenAI safety researcher working on RLHF, scalable oversight, and ARC's theoretical alignment agenda.
- Role
- researcher
- Currently at
- Alignment Research Center (ARC)
- twitter.com/JacobHHilton
- GitHub
- github.com/jacobhilton
- Scholar
- scholar.google.com/citations
- Papers
- 7
Cite
Notes
Only stored in your browser.
Authored papers
7Obfuscated Activations Bypass LLM Latent-Space Defenses
arXiv 2024
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
TMLR
Training language models to follow instructions with human feedback
NeurIPS
Teaching Models to Express Their Uncertainty in Words
arXiv 2022
Training Verifiers to Solve Math Word Problems
preprint
TruthfulQA: Measuring How Models Mimic Human Falsehoods
ACL
Batch size-invariance for policy optimization
batch-size-invariance-for-policy-optimization-1
Eval contributions
1Affiliations
Previously
Frequent co-authors
10from 7 papers
John Schulman
co-founder
Owain Evans
founder
Stephanie Lin
researcher
Alex Ray
researcher
Amanda Askell
researcher
Karl Cobbe
research-scientist
Aarohi Srivastava
researcher
Abhay Sheshadri
Abhinav Rastogi
researcher
Abhishek Rao