Zhiheng Xi
- Papers
- 36
Cite
Notes
Only stored in your browser.
Authored papers
36AI Can Learn Scientific Taste
arXiv 2026
AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security
arXiv 2026
Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models
arXiv 2026
LLMEval-Logic: A Solver-Verified Chinese Benchmark for Logical Reasoning of LLMs with Adversarial Hardening
arXiv 2026
OpenNovelty: An LLM-powered Agentic System for Verifiable Scholarly Novelty Assessment
arXiv 2026
ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development
arXiv 2026
Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment
arXiv 2026
SciAgentGym: Benchmarking Multi-Step Scientific Tool-use in LLM Agents
arXiv 2026
FRoM-W1: Towards General Humanoid Whole-Body Control with Language Instructions
arXiv 2026
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training
arXiv 2025
CritiQ: Mining Data Quality Criteria from Human Preferences
arXiv 2025
Better Process Supervision with Bi-directional Rewarding Signals
arXiv 2025
Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction
arXiv 2025
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning
arXiv 2025
Pre-Trained Policy Discriminators are General Reward Models
arXiv 2025
BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping
arXiv 2025
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination
arXiv 2025
BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset
arXiv 2025
Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments
arXiv 2025
PFDial: A Structured Dialogue Instruction Fine-tuning Method Based on UML Flowcharts
arXiv 2025
Critique-RL: Training Language Models for Critiquing through Two-Stage Reinforcement Learning
arXiv 2025
EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models
arXiv 2024
Distill Visual Chart Reasoning Ability from LLMs to MLLMs
arXiv 2024
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback
arXiv 2024
Secrets of RLHF in Large Language Models Part II: Reward Modeling
arXiv 2024
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments
arXiv 2024
RMB: Comprehensively Benchmarking Reward Models in LLM Alignment
arXiv 2024
Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling
arXiv 2024
Multi-Programming Language Sandbox for LLMs
arXiv 2024
Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning
arXiv 2024
MouSi: Poly-Visual-Expert Vision-Language Models
arXiv 2024
Self-Demos: Eliciting Out-of-Demonstration Generalizability in Large Language Models
arXiv 2024
LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models via MoE-Style Plugin
arXiv 2023
The Rise and Potential of Large Language Model Based Agents: A Survey
arXiv 2023
Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement
arXiv 2023
TRACE: A Comprehensive Benchmark for Continual Learning in Large Language Models
arXiv 2023
Affiliations
Frequent co-authors
10from 36 papers