Rico Angell

Cite

Notes

Only stored in your browser.

Attribution

3papers

Authored papers

Monitoring Decomposition Attacks in LLMs with Lightweight Sequential Monitors

arXiv 2025

Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models

arXiv 2024

Efficient Nearest Neighbor Search for Cross-Encoder Models using Matrix Factorization

arXiv 2022

No known affiliations.

from 3 papers

Adam Karvonen

Andrew McCallum

Benjamin Wright

Can Rager

Chen Yueh-Han

Claudio Mayrink Verdun

David Bau

He He

professor

Jannik Brinkmann

Logan Smith