David Krueger

Papers: 7

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: Semantic Scholar

Attribution policy →

7papers

Authored papers

Towards Interpreting Visual Information Processing in Vision-Language Models

arXiv 2024

2024

Enhancing Neural Network Interpretability with Feature-Aligned Sparse Autoencoders

arXiv 2024

2024

Quantifying Feature Space Universality Across Large Language Models via Sparse Autoencoders

arXiv 2024

2024

Reward Model Ensembles Help Mitigate Overoptimization

arXiv 2023

2023

Implicit meta-learning may lead language models to trust more reliable sources

arXiv 2023

2023

Interpreting Learned Feedback Patterns in Large Language Models

arXiv 2023

2023

Mechanistic Mode Connectivity

arXiv 2022

2022

Affiliations

No known affiliations.

Frequent co-authors

from 7 papers

Fazl Barez

Philip Torr

Clement Neo

Luke Marks

Alasdair Paren

Amir Abdullah

Ashkan Khakzar

Austin Meek

Bruno Mlodozeniec

Dmitrii Krasheninnikov

1 shared paper