Cite
Notes
Only stored in your browser.
Attribution
Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play
arXiv 2025
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning
from 2 papers
Wentian Zhao
Bo Liu
researcher
Hai Helen Li
Jin Can
Jin Mingyu
Jing Shi
Li Yu-Jhe
Mei Kai
Metaxas Dimitris
Qinsi Wang