Geoffrey Irving
- Papers
- 4
Cite
Notes
Only stored in your browser.
4papers
Authored papers
4Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs
arXiv 2025
Scalable AI Safety via Doubly-Efficient Debate
arXiv 2023
Fine-Tuning Language Models via Epistemic Neural Networks
arXiv 2022
Fine-Tuning Language Models from Human Preferences
arXiv 2019
Affiliations
No known affiliations.
Frequent co-authors
10from 4 papers