Haitao Mi
- Papers
- 22
Cite
Notes
Only stored in your browser.
Authored papers
22Training LLM Agents for Spontaneous, Reward-Free Self-Evolution via World Knowledge Exploration
arXiv 2026
The Pensieve Paradigm: Stateful Language Models Mastering Their Own Context
arXiv 2026
Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification
arXiv 2026
Free(): Learning to Forget in Malloc-Only Reasoning Models
arXiv 2026
Locas: Your Models are Principled Initializers of Locally-Supported Parametric Memories
arXiv 2026
Self-Rewarding Vision-Language Model via Reasoning Decomposition
arXiv 2025
DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning
arXiv 2025
WebEvolver: Enhancing Web Agent Self-Improvement with Coevolving World Model
arXiv 2025
R-Zero: Self-Evolving Reasoning LLM from Zero Data
arXiv 2025
Explore to Evolve: Scaling Evolved Aggregation Logic via Proactive Online Exploration for Deep Research Agents
arXiv 2025
The End of Manual Decoding: Towards Truly End-to-End Language Models
arXiv 2025
Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls
arXiv 2025
Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training
arXiv 2025
Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation
arXiv 2025
DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning
arXiv 2025
Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards
arXiv 2025
Scaling Synthetic Data Creation with 1,000,000,000 Personas
arXiv 2024
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
arXiv 2024
Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching
arXiv 2024
HDFlow: Enhancing LLM Complex Problem-Solving with Hybrid Thinking and Dynamic Workflows
arXiv 2024
DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search
arXiv 2024
The Trickle-down Impact of Reward (In-)consistency on RLHF
arXiv 2023
Affiliations
Frequent co-authors
10from 22 papers