Aviral Kumar
- Papers
- 14
Cite
Notes
Only stored in your browser.
Authored papers
14Reasoning Cache: Continual Improvement Over Long Horizons via Short-Horizon RL
arXiv 2026
InT: Self-Proposed Interventions Enable Credit Assignment in LLM Reasoning
arXiv 2026
Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction
arXiv 2025
Digi-Q: Learning Q-Value Functions for Training Device-Control Agents
arXiv 2025
Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance
arXiv 2024
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
arXiv 2024
Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data
arXiv 2024
RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold
arXiv 2024
Training Language Models to Self-Correct via Reinforcement Learning
arXiv 2024
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
arXiv 2024
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
arXiv 2024
Unfamiliar Finetuning Examples Control How Language Models Hallucinate
arXiv 2024
Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models
arXiv 2023
D4RL: Datasets for Deep Data-Driven Reinforcement Learning
arXiv 2020
Affiliations
Frequent co-authors
10from 14 papers