Cite
Notes
Only stored in your browser.
Attribution
RelayLLM: Efficient Reasoning via Collaborative Decoding
arXiv 2026
Process Rewards with Learned Reliability
Training Data Efficiency in Multimodal Process Reward Models
G-Zero: Self-Play for Open-Ended Generation from Zero Data
from 4 papers
Chengsong Huang
Jiaxin Huang
Langlin Huang
Haolin Liu
Shaoyang Xu
Tong Zheng
Wenxuan Zhang
Donghong Cai
Runpeng Dai
Yu Meng