0

Shizhe Diao

Papers
31

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
31papers

Authored papers

31

ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents

arXiv 2026

2026

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

arXiv 2026

2026

Recursive Multi-Agent Systems

arXiv 2026

2026

Progressive Residual Warmup for Language Model Pretraining

arXiv 2026

2026

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

arXiv 2025

2025

ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration

arXiv 2025

2025

Fast-dLLM v2: Efficient Block-Diffusion LLM

arXiv 2025

2025

GPAS: Accelerating Convergence of LLM Pretraining via Gradient-Preserving Activation Scaling

arXiv 2025

2025

Double-Checker: Enhancing Reasoning of Slow-Thinking LLMs via Self-Critical Fine-Tuning

arXiv 2025

2025

UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models

arXiv 2025

2025

MA-LoT: Multi-Agent Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem Proving

arXiv 2025

2025

LongMamba: Enhancing Mamba's Long Context Capabilities via Training-Free Receptive Field Enlargement

arXiv 2025

2025

LaTtE-Flow: Layerwise Timestep-Expert Flow-based Transformer

arXiv 2025

2025

GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving

arXiv 2025

2025

Adapt-Pruner: Adaptive Structural Pruning for Efficient Small Language Model Training

arXiv 2025

2025

LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning

arXiv 2024

2024

Hymba: A Hybrid-head Architecture for Small Language Models

arXiv 2024

2024

Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models

arXiv 2024

2024

Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards

arXiv 2024

2024

Entropy-Regularized Process Reward Model

arXiv 2024

2024

FIRST: Teach A Reliable Large Language Model Through Efficient Trustworthy Distillation

arXiv 2024

2024

SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales

arXiv 2024

2024

TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts

arXiv 2024

2024

Can We Verify Step by Step for Incorrect Answer Detection?

arXiv 2024

2024

Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts

ICCV 2023 1

2023

Mixture-of-Domain-Adapters: Decoupling and Injecting Domain Knowledge to Pre-trained Language Models Memories

arXiv 2023

2023

Mitigating the Alignment Tax of RLHF

arXiv 2023

2023

Active Prompting with Chain-of-Thought for Large Language Models

arXiv 2023

2023

R-Tuning: Instructing Large Language Models to Say `I Don't Know'

arXiv 2023

2023

Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data

arXiv 2023

2023

Plum: Prompt Learning using Metaheuristic

arXiv 2023

2023

Affiliations

No known affiliations.

Frequent co-authors

10

from 31 papers