Rongwu Xu
- Papers
- 4
Cite
Notes
Only stored in your browser.
4papers
Authored papers
4Course-Correction: Safety Alignment Using Synthetic Preferences
arXiv 2024
Knowledge Conflicts for LLMs: A Survey
arXiv 2024
On the Role of Attention Heads in Large Language Model Safety
arXiv 2024
How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States
arXiv 2024
Affiliations
No known affiliations.
Frequent co-authors
10from 4 papers