Cite
Notes
Only stored in your browser.
Attribution
Offline Reinforcement Learning for LLM Multi-Step Reasoning
arXiv 2024
from 1 papers
Hanze Dong
Shenao Zhang
Shibo Hao
Yi Wu
Yilin Bao
Ziran Yang