Cite
Notes
Only stored in your browser.
Attribution
Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models
arXiv 2024
Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!
from 2 papers
Chao Yang
Jie Liu
Yu Qiao
Zhanhui Zhou
Jiaheng Liu
Wanli Ouyang
Zhixuan Liu