Tian Liang
- Papers
- 15
Cite
Notes
Only stored in your browser.
Authored papers
15Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies
arXiv 2025
Free(): Learning to Forget in Malloc-Only Reasoning Models
arXiv 2026
From P(y|x) to P(y): Investigating Reinforcement Learning in Pre-train Space
arXiv 2026
The Pensieve Paradigm: Stateful Language Models Mastering Their Own Context
arXiv 2026
DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning
arXiv 2025
DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning
arXiv 2025
Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards
arXiv 2025
Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability
arXiv 2024
How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments
arXiv 2024
Draft Model Knows When to Stop: A Self-Verification Length Policy for Speculative Decoding
arXiv 2024
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training
arXiv 2024
CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
arXiv 2024
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate
arXiv 2023
Exploring Human-Like Translation Strategy with Large Language Models
arXiv 2023
Leveraging Word Guessing Games to Assess the Intelligence of Large Language Models
arXiv 2023
Affiliations
Frequent co-authors
10from 15 papers