Rongwu Xu

Cite

Notes

Only stored in your browser.

Attribution

4papers

Authored papers

Course-Correction: Safety Alignment Using Synthetic Preferences

arXiv 2024

Knowledge Conflicts for LLMs: A Survey

arXiv 2024

On the Role of Attention Heads in Large Language Model Safety

arXiv 2024

How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States

arXiv 2024

No known affiliations.

from 4 papers

Zhenhong Zhou

Fei Huang

Haiyang Yu

Wei Xu

Xinghua Zhang

Yongbin Li

Cunxiang Wang

Haiqin Weng

Han Qiu

Hongru Wang