Papers

Trending research and the full catalog - each paper linked to the benchmarks, methods, and models it introduces.

Filtered by domain: question-answeringClear

OPID: On-Policy Skill Distillation for Agentic Reinforcement Learning

25 Jun 2026

Outcome-based reinforcement learning provides a stable optimization backbone for language agents, but its sparse trajectory-level rewards provide little guidance on which intermediate decisions should be reinforced or suppressed.

Question Answering Reinforcement Learning

330.5/h