Bartosz Cywiński

Papers: 5

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: Semantic Scholar

Attribution policy →

5papers

Authored papers

Censored LLMs as a Natural Testbed for Secret Knowledge Elicitation

arXiv 2026

2026

SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders

arXiv 2025

2025

Eliciting Secret Knowledge from Language Models

arXiv 2025

2025

Towards eliciting latent knowledge from LLMs with mechanistic interpretability

arXiv 2025

2025

GUIDE: Guidance-based Incremental Learning with Diffusion Models

arXiv 2024

2024

Affiliations

No known affiliations.

Frequent co-authors

from 5 papers

Neel Nanda

researcher

Emil Ryd

Kamil Deja

Samuel Marks

Senthooran Rajamanoharan

2 shared papers

Arthur Conmy

1 shared paper

Arya Jakkli

1 shared paper

Bartłomiej Twardowski

1 shared paper

Helena Casademunt

1 shared paper

Khoi Tran

1 shared paper