Shizhe Diao
- Papers
- 31
Cite
Notes
Only stored in your browser.
Authored papers
31ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents
arXiv 2026
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
arXiv 2026
Recursive Multi-Agent Systems
arXiv 2026
Progressive Residual Warmup for Language Model Pretraining
arXiv 2026
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
arXiv 2025
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration
arXiv 2025
Fast-dLLM v2: Efficient Block-Diffusion LLM
arXiv 2025
GPAS: Accelerating Convergence of LLM Pretraining via Gradient-Preserving Activation Scaling
arXiv 2025
Double-Checker: Enhancing Reasoning of Slow-Thinking LLMs via Self-Critical Fine-Tuning
arXiv 2025
UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models
arXiv 2025
MA-LoT: Multi-Agent Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem Proving
arXiv 2025
LongMamba: Enhancing Mamba's Long Context Capabilities via Training-Free Receptive Field Enlargement
arXiv 2025
LaTtE-Flow: Layerwise Timestep-Expert Flow-based Transformer
arXiv 2025
GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving
arXiv 2025
Adapt-Pruner: Adaptive Structural Pruning for Efficient Small Language Model Training
arXiv 2025
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning
arXiv 2024
Hymba: A Hybrid-head Architecture for Small Language Models
arXiv 2024
Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
arXiv 2024
Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards
arXiv 2024
Entropy-Regularized Process Reward Model
arXiv 2024
FIRST: Teach A Reliable Large Language Model Through Efficient Trustworthy Distillation
arXiv 2024
SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales
arXiv 2024
TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts
arXiv 2024
Can We Verify Step by Step for Incorrect Answer Detection?
arXiv 2024
Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts
ICCV 2023 1
Mixture-of-Domain-Adapters: Decoupling and Injecting Domain Knowledge to Pre-trained Language Models Memories
arXiv 2023
Mitigating the Alignment Tax of RLHF
arXiv 2023
Active Prompting with Chain-of-Thought for Large Language Models
arXiv 2023
R-Tuning: Instructing Large Language Models to Say `I Don't Know'
arXiv 2023
Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data
arXiv 2023
Plum: Prompt Learning using Metaheuristic
arXiv 2023
Affiliations
Frequent co-authors
10from 31 papers