David Krueger
- Papers
- 7
Cite
Notes
Only stored in your browser.
7papers
Authored papers
7Towards Interpreting Visual Information Processing in Vision-Language Models
arXiv 2024
Enhancing Neural Network Interpretability with Feature-Aligned Sparse Autoencoders
arXiv 2024
Quantifying Feature Space Universality Across Large Language Models via Sparse Autoencoders
arXiv 2024
Reward Model Ensembles Help Mitigate Overoptimization
arXiv 2023
Implicit meta-learning may lead language models to trust more reliable sources
arXiv 2023
Interpreting Learned Feedback Patterns in Large Language Models
arXiv 2023
Mechanistic Mode Connectivity
arXiv 2022
Affiliations
No known affiliations.
Frequent co-authors
10from 7 papers