Cite
Notes
Only stored in your browser.
Attribution
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs
arXiv 2024
from 1 papers
Jiaya Jia
Senqiao Yang
Xin Lai
Yukang Chen
Zhuotao Tian