Cite
Notes
Only stored in your browser.
Attribution
CLIPO: Contrastive Learning in Policy Optimization Generalizes RLVR
arXiv 2026
Search Self-play: Pushing the Frontier of Agent Capability without Supervision
arXiv 2025
from 2 papers
Guanjun Jiang
Pengyu Cheng
Chutian Wang
Guojun Zhang
Haonan Chen
Haotian Xu
Hongliang Lu
Jiajun Song
Jianhe Lin
Jiaqi Guo