Logan Smith

Cite

Notes

Only stored in your browser.

Attribution

3papers

Authored papers

Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models

arXiv 2024

Eliciting Latent Predictions from Transformers with the Tuned Lens

arXiv 2023

Researching Alignment Research: Unsupervised Analysis

arXiv 2022

No known affiliations.

from 3 papers

Adam Karvonen

Benjamin Wright

Can Rager

Claudio Mayrink Verdun

Danny Halawi

David Bau

Igor Ostrovsky

Jacob Steinhardt

founder

Jacques Thibodeau

Jan H. Kirchner