Cite
Notes
Only stored in your browser.
Attribution
Uncovering Safety Risks of Large Language Models through Concept Activation Vector
arXiv 2024
from 1 papers
Changyu Chen
Xiting Wang
Zhihao Xu