Cite
Notes
Only stored in your browser.
Attribution
SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models
arXiv 2025
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks
from 2 papers
Yuandong Tian
Bo Liu
researcher
Cai Zhou
Chenyu Wang
DiJia Su
Feiyu Chen
Jason Weston
Paria Rashidinejad
Sainbayar Sukhbaatar
Sergey Levine
professor