Penghui Qi
- Papers
- 9
Cite
Notes
Only stored in your browser.
9papers
Authored papers
9Rethinking the Trust Region in LLM Reinforcement Learning
arXiv 2026
Revisiting Parameter Server in LLM Post-Training
arXiv 2026
Understanding R1-Zero-Like Training: A Critical Perspective
arXiv 2025
PipeOffload: Improving Scalability of Pipeline Parallelism with Memory Optimization
arXiv 2025
Optimizing Anytime Reasoning via Budget Relative Policy Optimization
arXiv 2025
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning
arXiv 2025
Defeating the Training-Inference Mismatch via FP16
arXiv 2025
Pipeline Parallelism with Controllable Memory
arXiv 2024
Balancing Pipeline Parallelism with Vocabulary Parallelism
arXiv 2024
Affiliations
No known affiliations.
Frequent co-authors
10from 9 papers