Ganqu Cui
Tsinghua/Shanghai AI Lab researcher known for UltraFeedback, PRM800K-style process reward modeling, and open alignment data.
- Role
- researcher
- Currently at
- Shanghai AI Laboratory
- twitter.com/cgq2333
- GitHub
- github.com/cgq15
- Scholar
- scholar.google.com/citations
- Papers
- 31
Cite
Notes
Only stored in your browser.
Authored papers
31Post-Trained MoE Can Skip Half Experts via Self-Distillation
arXiv 2026
InCoder-32B: Code Foundation Model for Industrial Scenarios
arXiv 2026
TEMPO: Scaling Test-time Training for Large Reasoning Models
arXiv 2026
Think Longer to Explore Deeper: Learn to Explore In-Context via Length-Incentivized Reinforcement Learning
arXiv 2026
P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads
arXiv 2026
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
arXiv 2025
TTRL: Test-Time Reinforcement Learning
arXiv 2025
MiniCPM4: Ultra-Efficient LLMs on End Devices
arXiv 2025
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models
arXiv 2025
Learning to Reason under Off-Policy Guidance
arXiv 2025
Process Reinforcement through Implicit Rewards
arXiv 2025
MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe
arXiv 2025
FlowRL: Matching Reward Distributions for LLM Reasoning
arXiv 2025
A Survey of Reinforcement Learning for Large Reasoning Models
arXiv 2025
P1: Mastering Physics Olympiads with Reinforcement Learning
arXiv 2025
From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones
arXiv 2025
RLPR: Extrapolating RLVR to General Domains without Verifiers
arXiv 2025
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
arXiv 2025
UltraIF: Advancing Instruction Following from the Wild
arXiv 2025
From Drafts to Answers: Unlocking LLM Potential via Aggregation Fine-Tuning
arXiv 2025
JustRL: Scaling a 1.5B LLM with a Simple RL Recipe
arXiv 2025
RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness
CVPR 2025 1
Free Process Rewards without Process Labels
arXiv 2024
Noise Contrastive Alignment of Language Models with Explicit Rewards
arXiv 2024
Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment
arXiv 2024
Advancing LLM Reasoning Generalists with Preference Trees
arXiv 2024
UltraMedical: Building Specialized Generalists in Biomedicine
arXiv 2024
UltraFeedback: Boosting Language Models with High-quality Feedback
ICML
Tool Learning with Foundation Models
arXiv 2023
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
CVPR 2024 1
Exploring the Universal Vulnerability of Prompt-based Learning Paradigm
Findings (NAACL) 2022 7
Tool contributions
1Affiliations
Previously
Frequent co-authors
10from 31 papers
Ning Ding
researcher
Zhiyuan Liu
professor
Bowen Zhou
professor
Maosong Sun
professor
Yu Cheng
Kaiyan Zhang
Lifan Yuan
grad-student
Yuxin Zuo
Weize Chen
Bingxiang He