0

Qi Zhang

Papers
106

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
106papers

Authored papers

106

Outcome Accuracy is Not Enough: Aligning the Reasoning Process of Reward Models

arXiv 2026

2026

SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing

arXiv 2026

2026

Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models

arXiv 2026

2026

CL-bench: A Benchmark for Context Learning

arXiv 2026

2026

Muse: Towards Reproducible Long-Form Song Generation with Fine-Grained Style Control

arXiv 2026

2026

FRoM-W1: Towards General Humanoid Whole-Body Control with Language Instructions

arXiv 2026

2026

SciAgentGym: Benchmarking Multi-Step Scientific Tool-use in LLM Agents

arXiv 2026

2026

LLMEval-Logic: A Solver-Verified Chinese Benchmark for Logical Reasoning of LLMs with Adversarial Hardening

arXiv 2026

2026

OpenNovelty: An LLM-powered Agentic System for Verifiable Scholarly Novelty Assessment

arXiv 2026

2026

CCTU: A Benchmark for Tool Use under Complex Constraints

arXiv 2026

2026

UniG2U-Bench: Do Unified Models Advance Multimodal Understanding?

arXiv 2026

2026

Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment

arXiv 2026

2026

Can Deep Research Agents Find and Organize? Evaluating the Synthesis Gap with Expert Taxonomies

arXiv 2026

2026

Agri-R1: Agricultural Reasoning for Disease Diagnosis via Automated-Synthesis and Reinforcement Learning

arXiv 2026

2026

Code2Logic: Game-Code-Driven Data Synthesis for Enhancing VLMs General Reasoning

arXiv 2025

2025

VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model

arXiv 2025

2025

RePrompt: Reasoning-Augmented Reprompting for Text-to-Image Generation via Reinforcement Learning

arXiv 2025

2025

WorldPM: Scaling Human Preference Modeling

arXiv 2025

2025

A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language Models

arXiv 2025

2025

Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs

arXiv 2025

2025

Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric

arXiv 2025

2025

HealthiVert-GAN: A Novel Framework of Pseudo-Healthy Vertebral Image Synthesis for Interpretable Compression Fracture Grading

arXiv 2025

2025

Better Process Supervision with Bi-directional Rewarding Signals

arXiv 2025

2025

Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction

arXiv 2025

2025

A Vision-Language-Action-Critic Model for Robotic Real-World Reinforcement Learning

arXiv 2025

2025

AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning

arXiv 2025

2025

GSFixer: Improving 3D Gaussian Splatting with Reference-Guided Video Diffusion Priors

arXiv 2025

2025

BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping

arXiv 2025

2025

Pre-Trained Policy Discriminators are General Reward Models

arXiv 2025

2025

Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination

arXiv 2025

2025

BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset

arXiv 2025

2025

Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments

arXiv 2025

2025

Critique-RL: Training Language Models for Critiquing through Two-Stage Reinforcement Learning

arXiv 2025

2025

PFDial: A Structured Dialogue Instruction Fine-tuning Method Based on UML Flowcharts

arXiv 2025

2025

LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models

arXiv 2025

2025

UFO: A UI-Focused Agent for Windows OS Interaction

arXiv 2024

2024

EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models

arXiv 2024

2024

Large Action Models: From Inception to Implementation

arXiv 2024

2024

TSD-SR: One-Step Diffusion with Target Score Distillation for Real-World Image Super-Resolution

CVPR 2025 1

2024

MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

arXiv 2024

2024

NeuroClips: Towards High-fidelity and Smooth fMRI-to-Video Reconstruction

arXiv 2024

2024

Distill Visual Chart Reasoning Ability from LLMs to MLLMs

arXiv 2024

2024

SciER: An Entity and Relation Extraction Dataset for Datasets, Methods, and Tasks in Scientific Documents

arXiv 2024

2024

Length Generalization of Causal Transformers without Position Encoding

arXiv 2024

2024

StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback

arXiv 2024

2024

AgentGym: Evolving Large Language Model-based Agents across Diverse Environments

arXiv 2024

2024

TableGPT2: A Large Multimodal Model with Tabular Data Integration

arXiv 2024

2024

E5-V: Universal Embeddings with Multimodal Large Language Models

arXiv 2024

2024

Large Language Model-Brained GUI Agents: A Survey

arXiv 2024

2024

Multi-Programming Language Sandbox for LLMs

arXiv 2024

2024

Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning

arXiv 2024

2024

LongHeads: Multi-Head Attention is Secretly a Long Context Processor

arXiv 2024

2024

MouSi: Poly-Visual-Expert Vision-Language Models

arXiv 2024

2024

ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios

arXiv 2024

2024

EfficientRAG: Efficient Retriever for Multi-Hop Question Answering

arXiv 2024

2024

Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning

arXiv 2024

2024

Non-negative Contrastive Learning

arXiv 2024

2024

RMB: Comprehensively Benchmarking Reward Models in LLM Alignment

arXiv 2024

2024

Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling

arXiv 2024

2024

Beyond Boundaries: Learning a Universal Entity Taxonomy across Datasets and Languages for Open Named Entity Recognition

arXiv 2024

2024

CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models

ICCV 2025

2024

Hero-SR: One-Step Diffusion for Super-Resolution with Human Perception Priors

arXiv 2024

2024

Secrets of RLHF in Large Language Models Part II: Reward Modeling

arXiv 2024

2024

TransferTOD: A Generalizable Chinese Multi-Domain Task-Oriented Dialogue System with Transfer Capabilities

arXiv 2024

2024

Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback

arXiv 2024

2024

MTL-LoRA: Low-Rank Adaptation for Multi-Task Learning

arXiv 2024

2024

When Graph meets Multimodal: Benchmarking on Multimodal Attributed Graphs Learning

arXiv 2024

2024

DocFusion: A Unified Framework for Document Parsing Tasks

arXiv 2024

2024

Call Me When Necessary: LLMs can Efficiently and Faithfully Reason over Structured Environments

arXiv 2024

2024

LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration

arXiv 2024

2024

On the Tip of the Tongue: Analyzing Conceptual Representation in Large Language Models with Reverse-Dictionary Probe

arXiv 2024

2024

FlowLearn: Evaluating Large Vision-Language Models on Flowchart Understanding

arXiv 2024

2024

TL-Training: A Task-Feature-Based Framework for Training Large Language Models in Tool Use

arXiv 2024

2024

Aligning Large Language Models from Self-Reference AI Feedback with one General Principle

arXiv 2024

2024

SafeAligner: Safety Alignment against Jailbreak Attacks via Response Disparity Guidance

arXiv 2024

2024

Self-Demos: Eliciting Out-of-Demonstration Generalizability in Large Language Models

arXiv 2024

2024

Are Large Language Models Good Prompt Optimizers?

arXiv 2024

2024

GS-IR: 3D Gaussian Splatting for Inverse Rendering

CVPR 2024 1

2023

LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models via MoE-Style Plugin

arXiv 2023

2023

InstructUIE: Multi-task Instruction Tuning for Unified Information Extraction

arXiv 2023

2023

TRACE: A Comprehensive Benchmark for Continual Learning in Large Language Models

arXiv 2023

2023

Movie101: A New Movie Understanding Benchmark

arXiv 2023

2023

CoCA: Fusing Position Embedding with Collinear Constrained Attention in Transformers for Long Context Window Extending

arXiv 2023

2023

Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement

arXiv 2023

2023

Democratizing Reasoning Ability: Tailored Learning from Large Language Model

arXiv 2023

2023

On the Generalization of Multi-modal Contrastive Learning

arXiv 2023

2023

Rescue: Ranking LLM Responses with Partial Ordering to Improve Response Generation

arXiv 2023

2023

Orthogonal Subspace Learning for Language Model Continual Learning

arXiv 2023

2023

The Rise and Potential of Large Language Model Based Agents: A Survey

arXiv 2023

2023

Dual-Alignment Pre-training for Cross-lingual Sentence Embedding

arXiv 2023

2023

IRGen: Generative Modeling for Image Retrieval

arXiv 2023

2023

Reading Order Matters: Information Extraction from Visually-rich Documents by Token Path Prediction

arXiv 2023

2023

Universal Multi-modal Entity Alignment via Iteratively Fusing Modality Similarity Paths

arXiv 2023

2023

Context-Aware Bayesian Network Actor-Critic Methods for Cooperative Multi-Agent Reinforcement Learning

arXiv 2023

2023

RE-Matching: A Fine-Grained Semantic Matching Method for Zero-Shot Relation Extraction

arXiv 2023

2023

Efficient Maximum Fair Clique Search over Large Networks

arXiv 2023

2023

PromptBERT: Improving BERT Sentence Embeddings with Prompts

arXiv 2022

2022

UV Volumes for Real-time Rendering of Editable Free-view Human Performance

CVPR 2023 1

2022

Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings

arXiv 2022

2022

Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval

arXiv 2022

2022

PanGu-Coder: Program Synthesis with Function-Level Language Modeling

arXiv 2022

2022

Nonlinear Sufficient Dimension Reduction for Distribution-on-Distribution Regression

arXiv 2022

2022

NTIRE 2021 Challenge on Quality Enhancement of Compressed Video: Methods and Results

arXiv 2021

2021

Thinking Clearly, Talking Fast: Concept-Guided Non-Autoregressive Generation for Open-Domain Dialogue Systems

EMNLP 2021 11

2021

BARS-CTR: Open Benchmarking for Click-Through Rate Prediction

arXiv 2020

2020

Tasty Burgers, Soggy Fries: Probing Aspect Robustness in Aspect-Based Sentiment Analysis

EMNLP 2020 11

2020

Affiliations

No known affiliations.

Frequent co-authors

10

from 106 papers