Severin Field

Cite

Notes

Only stored in your browser.

Attribution

2papers

Authored papers

Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals

arXiv 2024

Meta-Models: An Architecture for Decoding LLM Behaviors Through Interpreted Embeddings and Natural Language

arXiv 2024

No known affiliations.

from 2 papers

Anthony Costarelli

Caden Juang

Joshua Clymer

Mat Allen