Qi Zhang
- Papers
- 106
Cite
Notes
Only stored in your browser.
Authored papers
106Outcome Accuracy is Not Enough: Aligning the Reasoning Process of Reward Models
arXiv 2026
SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing
arXiv 2026
Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models
arXiv 2026
CL-bench: A Benchmark for Context Learning
arXiv 2026
Muse: Towards Reproducible Long-Form Song Generation with Fine-Grained Style Control
arXiv 2026
FRoM-W1: Towards General Humanoid Whole-Body Control with Language Instructions
arXiv 2026
SciAgentGym: Benchmarking Multi-Step Scientific Tool-use in LLM Agents
arXiv 2026
LLMEval-Logic: A Solver-Verified Chinese Benchmark for Logical Reasoning of LLMs with Adversarial Hardening
arXiv 2026
OpenNovelty: An LLM-powered Agentic System for Verifiable Scholarly Novelty Assessment
arXiv 2026
CCTU: A Benchmark for Tool Use under Complex Constraints
arXiv 2026
UniG2U-Bench: Do Unified Models Advance Multimodal Understanding?
arXiv 2026
Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment
arXiv 2026
Can Deep Research Agents Find and Organize? Evaluating the Synthesis Gap with Expert Taxonomies
arXiv 2026
Agri-R1: Agricultural Reasoning for Disease Diagnosis via Automated-Synthesis and Reinforcement Learning
arXiv 2026
Code2Logic: Game-Code-Driven Data Synthesis for Enhancing VLMs General Reasoning
arXiv 2025
VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model
arXiv 2025
RePrompt: Reasoning-Augmented Reprompting for Text-to-Image Generation via Reinforcement Learning
arXiv 2025
WorldPM: Scaling Human Preference Modeling
arXiv 2025
A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language Models
arXiv 2025
Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs
arXiv 2025
Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric
arXiv 2025
HealthiVert-GAN: A Novel Framework of Pseudo-Healthy Vertebral Image Synthesis for Interpretable Compression Fracture Grading
arXiv 2025
Better Process Supervision with Bi-directional Rewarding Signals
arXiv 2025
Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction
arXiv 2025
A Vision-Language-Action-Critic Model for Robotic Real-World Reinforcement Learning
arXiv 2025
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning
arXiv 2025
GSFixer: Improving 3D Gaussian Splatting with Reference-Guided Video Diffusion Priors
arXiv 2025
BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping
arXiv 2025
Pre-Trained Policy Discriminators are General Reward Models
arXiv 2025
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination
arXiv 2025
BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset
arXiv 2025
Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments
arXiv 2025
Critique-RL: Training Language Models for Critiquing through Two-Stage Reinforcement Learning
arXiv 2025
PFDial: A Structured Dialogue Instruction Fine-tuning Method Based on UML Flowcharts
arXiv 2025
LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models
arXiv 2025
UFO: A UI-Focused Agent for Windows OS Interaction
arXiv 2024
EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models
arXiv 2024
Large Action Models: From Inception to Implementation
arXiv 2024
TSD-SR: One-Step Diffusion with Target Score Distillation for Real-World Image Super-Resolution
CVPR 2025 1
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
arXiv 2024
NeuroClips: Towards High-fidelity and Smooth fMRI-to-Video Reconstruction
arXiv 2024
Distill Visual Chart Reasoning Ability from LLMs to MLLMs
arXiv 2024
SciER: An Entity and Relation Extraction Dataset for Datasets, Methods, and Tasks in Scientific Documents
arXiv 2024
Length Generalization of Causal Transformers without Position Encoding
arXiv 2024
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback
arXiv 2024
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments
arXiv 2024
TableGPT2: A Large Multimodal Model with Tabular Data Integration
arXiv 2024
E5-V: Universal Embeddings with Multimodal Large Language Models
arXiv 2024
Large Language Model-Brained GUI Agents: A Survey
arXiv 2024
Multi-Programming Language Sandbox for LLMs
arXiv 2024
Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning
arXiv 2024
LongHeads: Multi-Head Attention is Secretly a Long Context Processor
arXiv 2024
MouSi: Poly-Visual-Expert Vision-Language Models
arXiv 2024
ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios
arXiv 2024
EfficientRAG: Efficient Retriever for Multi-Hop Question Answering
arXiv 2024
Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning
arXiv 2024
Non-negative Contrastive Learning
arXiv 2024
RMB: Comprehensively Benchmarking Reward Models in LLM Alignment
arXiv 2024
Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling
arXiv 2024
Beyond Boundaries: Learning a Universal Entity Taxonomy across Datasets and Languages for Open Named Entity Recognition
arXiv 2024
CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models
ICCV 2025
Hero-SR: One-Step Diffusion for Super-Resolution with Human Perception Priors
arXiv 2024
Secrets of RLHF in Large Language Models Part II: Reward Modeling
arXiv 2024
TransferTOD: A Generalizable Chinese Multi-Domain Task-Oriented Dialogue System with Transfer Capabilities
arXiv 2024
Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback
arXiv 2024
MTL-LoRA: Low-Rank Adaptation for Multi-Task Learning
arXiv 2024
When Graph meets Multimodal: Benchmarking on Multimodal Attributed Graphs Learning
arXiv 2024
DocFusion: A Unified Framework for Document Parsing Tasks
arXiv 2024
Call Me When Necessary: LLMs can Efficiently and Faithfully Reason over Structured Environments
arXiv 2024
LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration
arXiv 2024
On the Tip of the Tongue: Analyzing Conceptual Representation in Large Language Models with Reverse-Dictionary Probe
arXiv 2024
FlowLearn: Evaluating Large Vision-Language Models on Flowchart Understanding
arXiv 2024
TL-Training: A Task-Feature-Based Framework for Training Large Language Models in Tool Use
arXiv 2024
Aligning Large Language Models from Self-Reference AI Feedback with one General Principle
arXiv 2024
SafeAligner: Safety Alignment against Jailbreak Attacks via Response Disparity Guidance
arXiv 2024
Self-Demos: Eliciting Out-of-Demonstration Generalizability in Large Language Models
arXiv 2024
Are Large Language Models Good Prompt Optimizers?
arXiv 2024
GS-IR: 3D Gaussian Splatting for Inverse Rendering
CVPR 2024 1
LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models via MoE-Style Plugin
arXiv 2023
InstructUIE: Multi-task Instruction Tuning for Unified Information Extraction
arXiv 2023
TRACE: A Comprehensive Benchmark for Continual Learning in Large Language Models
arXiv 2023
Movie101: A New Movie Understanding Benchmark
arXiv 2023
CoCA: Fusing Position Embedding with Collinear Constrained Attention in Transformers for Long Context Window Extending
arXiv 2023
Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement
arXiv 2023
Democratizing Reasoning Ability: Tailored Learning from Large Language Model
arXiv 2023
On the Generalization of Multi-modal Contrastive Learning
arXiv 2023
Rescue: Ranking LLM Responses with Partial Ordering to Improve Response Generation
arXiv 2023
Orthogonal Subspace Learning for Language Model Continual Learning
arXiv 2023
The Rise and Potential of Large Language Model Based Agents: A Survey
arXiv 2023
Dual-Alignment Pre-training for Cross-lingual Sentence Embedding
arXiv 2023
IRGen: Generative Modeling for Image Retrieval
arXiv 2023
Reading Order Matters: Information Extraction from Visually-rich Documents by Token Path Prediction
arXiv 2023
Universal Multi-modal Entity Alignment via Iteratively Fusing Modality Similarity Paths
arXiv 2023
Context-Aware Bayesian Network Actor-Critic Methods for Cooperative Multi-Agent Reinforcement Learning
arXiv 2023
RE-Matching: A Fine-Grained Semantic Matching Method for Zero-Shot Relation Extraction
arXiv 2023
Efficient Maximum Fair Clique Search over Large Networks
arXiv 2023
PromptBERT: Improving BERT Sentence Embeddings with Prompts
arXiv 2022
UV Volumes for Real-time Rendering of Editable Free-view Human Performance
CVPR 2023 1
Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings
arXiv 2022
Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval
arXiv 2022
PanGu-Coder: Program Synthesis with Function-Level Language Modeling
arXiv 2022
Nonlinear Sufficient Dimension Reduction for Distribution-on-Distribution Regression
arXiv 2022
NTIRE 2021 Challenge on Quality Enhancement of Compressed Video: Methods and Results
arXiv 2021
Thinking Clearly, Talking Fast: Concept-Guided Non-Autoregressive Generation for Open-Domain Dialogue Systems
EMNLP 2021 11
BARS-CTR: Open Benchmarking for Click-Through Rate Prediction
arXiv 2020
Tasty Burgers, Soggy Fries: Probing Aspect Robustness in Aspect-Based Sentiment Analysis
EMNLP 2020 11
Affiliations
Frequent co-authors
10from 106 papers