Hanze Dong
- Papers
- 15
Cite
Notes
Only stored in your browser.
Authored papers
15Self-Hinting Language Models Enhance Reinforcement Learning
arXiv 2026
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce
arXiv 2025
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models
arXiv 2025
Reward-Guided Speculative Decoding for Efficient LLM Reasoning
arXiv 2025
Scalable Chain of Thoughts via Elastic Reasoning
arXiv 2025
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
arXiv 2025
Fractured Chain-of-Thought Reasoning
arXiv 2025
Offline Reinforcement Learning for LLM Multi-Step Reasoning
arXiv 2024
ThinK: Thinner Key Cache by Query-Driven Pruning
arXiv 2024
Entropy-Regularized Process Reward Model
arXiv 2024
Automatic Curriculum Expert Iteration for Reliable LLM Reasoning
arXiv 2024
FIRST: Teach A Reliable Large Language Model Through Efficient Trustworthy Distillation
arXiv 2024
Mitigating the Alignment Tax of RLHF
arXiv 2023
Local Augmentation for Graph Neural Networks
local-augmentation-for-graph-neural-networks-1
Weakly Supervised Disentangled Generative Causal Representation Learning
disentangled-generative-causal-representation
Affiliations
Frequent co-authors
10from 15 papers