Cite
Notes
Only stored in your browser.
Attribution
AgentProcessBench: Diagnosing Step-Level Process Quality in Tool-Using Agents
arXiv 2026
LaSeR: Reinforcement Learning with Last-Token Self-Rewarding
arXiv 2025
Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment
arXiv 2024
from 3 papers
Yankai Lin
Ruobing Xie
Wenkai Yang
Bowen Sun
Ganqu Cui
researcher
Haotian Chen
Huimin Chen
Jie zhou
Jingwen Chen
Lifan Yuan
grad-student