Cite
Notes
Only stored in your browser.
Attribution
Training Language Models to Reason Efficiently
arXiv 2025
Accelerating Unbiased LLM Evaluation via Synthetic Feedback
Fast Best-of-N Decoding via Speculative Rejection
arXiv 2024
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
from 4 papers
Aviral Kumar
Daman Arora
Hanshi Sun
Huitao Yang
Jiahao Qiu
Jiayi Pan
grad-student
Mengdi Wang
Ming Yin
Momin Haider
Peter Bartlett