Qingpeng Cai
- Papers
- 4
Cite
Notes
Only stored in your browser.
4papers
Authored papers
4Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards
arXiv 2025
Sequential Recommendation for Optimizing Both Immediate Feedback and Long-term Retention
arXiv 2024
Multi-Task Recommendations with Reinforcement Learning
arXiv 2023
Two-Stage Constrained Actor-Critic for Short Video Recommendation
arXiv 2023
Affiliations
No known affiliations.
Frequent co-authors
10from 4 papers