Cite
Notes
Only stored in your browser.
Attribution
LLM Safety From Within: Detecting Harmful Content with Internal Representations
arXiv 2026
from 1 papers
Ashton Anderson
Difan Jiao
Haolun Wu
Ye Yuan
Yilun Liu
Zhenwei Tang