Cite
Notes
Only stored in your browser.
Attribution
Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges
arXiv 2026
Multi-Programming Language Sandbox for LLMs
arXiv 2024
Aligning Large Language Models with Human Preferences through Representation Engineering
arXiv 2023
from 3 papers
Changze Lv
Xuanjing Huang
Shihan Dou
Tao Gui
Wenhao Liu
Xiaohua Wang
Xiaoqing Zheng
Bowen Chen
Cenyuan Zhang
Feiran Zhang