0

Ganqu Cui

Tsinghua/Shanghai AI Lab researcher known for UltraFeedback, PRM800K-style process reward modeling, and open alignment data.

Role
researcher
Papers
31

Cite

Notes

Only stored in your browser.

31papers·1tool contribs

Authored papers

31

Post-Trained MoE Can Skip Half Experts via Self-Distillation

arXiv 2026

2026

InCoder-32B: Code Foundation Model for Industrial Scenarios

arXiv 2026

2026

TEMPO: Scaling Test-time Training for Large Reasoning Models

arXiv 2026

2026

Think Longer to Explore Deeper: Learn to Explore In-Context via Length-Incentivized Reinforcement Learning

arXiv 2026

2026

P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads

arXiv 2026

2026

SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning

arXiv 2025

2025

TTRL: Test-Time Reinforcement Learning

arXiv 2025

2025

MiniCPM4: Ultra-Efficient LLMs on End Devices

arXiv 2025

2025

The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models

arXiv 2025

2025

Learning to Reason under Off-Policy Guidance

arXiv 2025

2025

Process Reinforcement through Implicit Rewards

arXiv 2025

2025

MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe

arXiv 2025

2025

FlowRL: Matching Reward Distributions for LLM Reasoning

arXiv 2025

2025

A Survey of Reinforcement Learning for Large Reasoning Models

arXiv 2025

2025

P1: Mastering Physics Olympiads with Reinforcement Learning

arXiv 2025

2025

From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones

arXiv 2025

2025

RLPR: Extrapolating RLVR to General Domains without Verifiers

arXiv 2025

2025

A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond

arXiv 2025

2025

UltraIF: Advancing Instruction Following from the Wild

arXiv 2025

2025

From Drafts to Answers: Unlocking LLM Potential via Aggregation Fine-Tuning

arXiv 2025

2025

JustRL: Scaling a 1.5B LLM with a Simple RL Recipe

arXiv 2025

2025

RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness

CVPR 2025 1

2024

Free Process Rewards without Process Labels

arXiv 2024

2024

Noise Contrastive Alignment of Language Models with Explicit Rewards

arXiv 2024

2024

Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment

arXiv 2024

2024

Advancing LLM Reasoning Generalists with Preference Trees

arXiv 2024

2024

UltraMedical: Building Specialized Generalists in Biomedicine

arXiv 2024

2024

UltraFeedback: Boosting Language Models with High-quality Feedback

ICML

2023

Tool Learning with Foundation Models

arXiv 2023

2023

RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

CVPR 2024 1

2023

Exploring the Universal Vulnerability of Prompt-based Learning Paradigm

Findings (NAACL) 2022 7

2022

Tool contributions

1

Affiliations

Currently at

Shanghai AI Laboratory

researcher · research group

Frequent co-authors

10

from 31 papers