Tomek Korbak

Cite

Notes

Only stored in your browser.

Attribution

4papers

Authored papers

Reasoning Models Struggle to Control their Chains of Thought

arXiv 2026

Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs

arXiv 2025

Async Control: Stress-testing Asynchronous Control Measures for LLM Agents

arXiv 2025

Looking Inward: Language Models Can Learn About Themselves by Introspection

arXiv 2024

No known affiliations.

from 4 papers

Alan Cooney

Arathi Mani

Asa Cooper Stickland

researcher

Bowen Baker

researcher

Bruce W. Lee

Charlie Griffin

Chen Yueh-Han

Ethan Perez

Felix J Binder

Geoffrey Irving