Cite
Notes
Only stored in your browser.
Attribution
Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space
arXiv 2025
Discrete Markov Bridge
FlowRL: Matching Reward Distributions for LLM Reasoning
How to Synthesize Text Data without Model Collapse?
arXiv 2024
from 4 papers
Xuekai Zhu
Zilong Zheng
Bowen Zhou
professor
Daixuan Cheng
Ermo Hua
Kaiyan Zhang
Ning Ding
researcher
Song-Chun Zhu
Xingtai Lv
Ying Nian Wu