Cite
Notes
Only stored in your browser.
Attribution
Your Language Model is Its Own Critic: Reinforcement Learning with Value Estimation from Actor's Internal States
arXiv 2026
from 1 papers
Jeonghoon Shim
Jongwon Lim
Minjae Oh
Yohan Jo
Yunho Choi