Cite
Notes
Only stored in your browser.
Attribution
Representation Bending for Large Language Model Safety
arXiv 2025
from 1 papers
Alvin Wan
Ashkan Yousefpour
Harrison Ngan
Jonghyun Choi
Seungbeen Lee
Seungju Han
Taeheon Kim
Wonje Jeung
Youngjae Yu