Cite
Notes
Only stored in your browser.
Attribution
LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization
arXiv 2025
Out-of-Dynamics Imitation Learning from Multimodal Demonstrations
arXiv 2022
When to Trust Your Simulator: Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning
from 3 papers
Guyue Zhou
Haoyi Niu
Jialong Wu
Jian Shao
Jianming Hu
Jun Xiao
Linjuan Wu
Ming Li
Mingsheng Long
Shangke Lyu