Cite
Notes
Only stored in your browser.
Attribution
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
arXiv 2024
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models
arXiv 2023
from 2 papers
Charlie Snell
Aviral Kumar
Charles Sun
Isadora White
Jaehoon Lee
researcher
Joey Hong
Marwa Abdulhai
Sergey Levine
professor
Yuexiang Zhai