Cite
Notes
Only stored in your browser.
Attribution
CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning
arXiv 2025
Let it Calm: Exploratory Annealed Decoding for Verifiable Reinforcement Learning
Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment
from 3 papers
Zhuokai Zhao
Jiayi Liu
Xiangjun Fan
Baosheng He
Chaoqi Wang
Chen Zhu
Chenghao Yang
Chenxiao Yang
Hanchao Yu
Hao Ma