Tao Gui
- Papers
- 61
Cite
Notes
Only stored in your browser.
Authored papers
61Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges
arXiv 2026
Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models
arXiv 2026
CL-bench: A Benchmark for Context Learning
arXiv 2026
Muse: Towards Reproducible Long-Form Song Generation with Fine-Grained Style Control
arXiv 2026
SciAgentGym: Benchmarking Multi-Step Scientific Tool-use in LLM Agents
arXiv 2026
FRoM-W1: Towards General Humanoid Whole-Body Control with Language Instructions
arXiv 2026
LLMEval-Logic: A Solver-Verified Chinese Benchmark for Logical Reasoning of LLMs with Adversarial Hardening
arXiv 2026
OpenNovelty: An LLM-powered Agentic System for Verifiable Scholarly Novelty Assessment
arXiv 2026
CCTU: A Benchmark for Tool Use under Complex Constraints
arXiv 2026
ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development
arXiv 2026
Outcome Accuracy is Not Enough: Aligning the Reasoning Process of Reward Models
arXiv 2026
Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment
arXiv 2026
Can Deep Research Agents Find and Organize? Evaluating the Synthesis Gap with Expert Taxonomies
arXiv 2026
Code2Logic: Game-Code-Driven Data Synthesis for Enhancing VLMs General Reasoning
arXiv 2025
Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction
arXiv 2025
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning
arXiv 2025
BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping
arXiv 2025
Pre-Trained Policy Discriminators are General Reward Models
arXiv 2025
BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset
arXiv 2025
Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments
arXiv 2025
R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?
arXiv 2025
Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric
arXiv 2025
WorldPM: Scaling Human Preference Modeling
arXiv 2025
A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language Models
arXiv 2025
CritiQ: Mining Data Quality Criteria from Human Preferences
arXiv 2025
Better Process Supervision with Bi-directional Rewarding Signals
arXiv 2025
Critique-RL: Training Language Models for Critiquing through Two-Stage Reinforcement Learning
arXiv 2025
Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs
arXiv 2025
PFDial: A Structured Dialogue Instruction Fine-tuning Method Based on UML Flowcharts
arXiv 2025
EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models
arXiv 2024
Secrets of RLHF in Large Language Models Part II: Reward Modeling
arXiv 2024
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments
arXiv 2024
Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning
arXiv 2024
LongHeads: Multi-Head Attention is Secretly a Long Context Processor
arXiv 2024
MouSi: Poly-Visual-Expert Vision-Language Models
arXiv 2024
ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios
arXiv 2024
RMB: Comprehensively Benchmarking Reward Models in LLM Alignment
arXiv 2024
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model
arXiv 2024
Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling
arXiv 2024
Beyond Boundaries: Learning a Universal Entity Taxonomy across Datasets and Languages for Open Named Entity Recognition
arXiv 2024
Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning
arXiv 2024
Distill Visual Chart Reasoning Ability from LLMs to MLLMs
arXiv 2024
Length Generalization of Causal Transformers without Position Encoding
arXiv 2024
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback
arXiv 2024
Multi-Programming Language Sandbox for LLMs
arXiv 2024
TransferTOD: A Generalizable Chinese Multi-Domain Task-Oriented Dialogue System with Transfer Capabilities
arXiv 2024
LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration
arXiv 2024
TL-Training: A Task-Feature-Based Framework for Training Large Language Models in Tool Use
arXiv 2024
SafeAligner: Safety Alignment against Jailbreak Attacks via Response Disparity Guidance
arXiv 2024
Self-Demos: Eliciting Out-of-Demonstration Generalizability in Large Language Models
arXiv 2024
Are Large Language Models Good Prompt Optimizers?
arXiv 2024
LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models via MoE-Style Plugin
arXiv 2023
InstructUIE: Multi-task Instruction Tuning for Unified Information Extraction
arXiv 2023
TRACE: A Comprehensive Benchmark for Continual Learning in Large Language Models
arXiv 2023
Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement
arXiv 2023
Orthogonal Subspace Learning for Language Model Continual Learning
arXiv 2023
The Rise and Potential of Large Language Model Based Agents: A Survey
arXiv 2023
Reading Order Matters: Information Extraction from Visually-rich Documents by Token Path Prediction
arXiv 2023
Universal Multi-modal Entity Alignment via Iteratively Fusing Modality Similarity Paths
arXiv 2023
RE-Matching: A Fine-Grained Semantic Matching Method for Zero-Shot Relation Extraction
arXiv 2023
Rescue: Ranking LLM Responses with Partial Ordering to Improve Response Generation
arXiv 2023
Affiliations
Frequent co-authors
10from 61 papers