Linfeng Du

Cite

Notes

Only stored in your browser.

Attribution

1papers

Authored papers

LLM Safety From Within: Detecting Harmful Content with Internal Representations

arXiv 2026

No known affiliations.

from 1 papers

Ashton Anderson

Difan Jiao

Haolun Wu

Ye Yuan

Yilun Liu

Zhenwei Tang