Ruixuan Huang

Cite

Notes

Only stored in your browser.

Attribution

1papers

Authored papers

Uncovering Safety Risks of Large Language Models through Concept Activation Vector

arXiv 2024

No known affiliations.

from 1 papers

Changyu Chen

Xiting Wang

Zhihao Xu