0

Yue Zhang

Papers
84

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
84papers

Authored papers

84

SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks

arXiv 2026

2026

AutoFigure-Edit: Generating Editable Scientific Illustration

arXiv 2026

2026

EPiC: Efficient Video Camera Control Learning with Precise Anchor-Video Guidance

arXiv 2025

2026

AnchorWeave: World-Consistent Video Generation with Retrieved Local Spatial Memories

arXiv 2026

2026

Detecting RLVR Training Data via Structural Convergence of Reasoning

arXiv 2026

2026

FAAST: Forward-Only Associative Learning via Closed-Form Fast Weights for Test-Time Supervised Adaptation

arXiv 2026

2026

AutoFigure: Generating and Refining Publication-Ready Scientific Illustrations

arXiv 2026

2026

PhyMotion: Structured 3D Motion Reward for Physics-Grounded Human Video Generation

arXiv 2026

2026

When and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning

arXiv 2026

2026

AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios

arXiv 2026

2026

OpenNovelty: An LLM-powered Agentic System for Verifiable Scholarly Novelty Assessment

arXiv 2026

2026

V-Co: A Closer Look at Visual Representation Alignment via Co-Denoising

arXiv 2026

2026

Think Longer to Explore Deeper: Learn to Explore In-Context via Length-Incentivized Reinforcement Learning

arXiv 2026

2026

VisionCoach: Reinforcing Grounded Video Reasoning via Visual-Perception Prompting

arXiv 2026

2026

PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model

arXiv 2025

2025

SWE-SQL: Illuminating LLM Pathways to Solve User SQL Issues in Real-World Applications

arXiv 2025

2025

Learning to Reason under Off-Policy Guidance

arXiv 2025

2025

DeepReview: Improving LLM-based Paper Review with Human-like Deep Thinking Process

arXiv 2025

2025

A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond

arXiv 2025

2025

Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector

CVPR 2025 1

2025

RewardAnything: Generalizable Principle-Following Reward Models

arXiv 2025

2025

Error-Driven Scene Editing for 3D Grounding in Large Language Models

arXiv 2025

2025

Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model

arXiv 2025

2025

Lost in Literalism: How Supervised Training Shapes Translationese in LLMs

arXiv 2025

2025

An Empirical Analysis of Uncertainty in Large Language Model Evaluations

arXiv 2025

2025

Reasoning is All You Need for Video Generalization: A Counterfactual Benchmark with Sub-question Evaluation

arXiv 2025

2025

Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving

arXiv 2025

2025

Deep Research: A Systematic Survey

arXiv 2025

2025

DeepScientist: Advancing Frontier-Pushing Scientific Findings Progressively

arXiv 2025

2025

LMR-BENCH: Evaluating LLM Agent's Ability on Reproducing Language Modeling Research

arXiv 2025

2025

LIMI: Less is More for Agency

arXiv 2025

2025

TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them

arXiv 2025

2025

Planning with Sketch-Guided Verification for Physics-Aware Video Generation

arXiv 2025

2025

A Systematic Study of Code Obfuscation Against LLM-based Vulnerability Detection

arXiv 2025

2025

MEXA: Towards General Multimodal Reasoning with Dynamic Multi-Expert Aggregation

arXiv 2025

2025

AutoSurvey: Large Language Models Can Automatically Write Surveys

arXiv 2024

2024

RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation

arXiv 2024

2024

Direct Preference Optimization Using Sparse Feature-Level Constraints

arXiv 2024

2024

RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation

arXiv 2024

2024

Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement

arXiv 2024

2024

Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models

arXiv 2024

2024

Knowledge Conflicts for LLMs: A Survey

arXiv 2024

2024

Personality Alignment of Large Language Models

arXiv 2024

2024

A Unified Hallucination Mitigation Framework for Large Vision-Language Models

arXiv 2024

2024

See What LLMs Cannot Answer: A Self-Challenge Framework for Uncovering LLM Weaknesses

arXiv 2024

2024

LexMatcher: Dictionary-centric Data Collection for LLM-based Machine Translation

arXiv 2024

2024

ECon: On the Detection and Resolution of Evidence Conflicts

arXiv 2024

2024

MuseTalk: Real-Time High-Fidelity Video Dubbing via Spatio-Temporal Sampling

arXiv 2024

2024

NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens

arXiv 2024

2024

Can Language Models Learn to Skip Steps?

arXiv 2024

2024

FreeEval: A Modular Framework for Trustworthy and Efficient Evaluation of Large Language Models

arXiv 2024

2024

DocFusion: A Unified Framework for Document Parsing Tasks

arXiv 2024

2024

Fast-DetectGPT: Efficient Zero-Shot Detection of Machine-Generated Text via Conditional Probability Curvature

arXiv 2023

2023

A Survey on Evaluation of Large Language Models

arXiv 2023

2023

Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models

arXiv 2023

2023

MAGE: Machine-generated Text Detection in the Wild

arXiv 2023

2023

Evaluating the Logical Reasoning Ability of ChatGPT and GPT-4

arXiv 2023

2023

LogiCoT: Logical Chain-of-Thought Instruction-Tuning

arXiv 2023

2023

Understanding In-Context Learning from Repetitions

arXiv 2023

2023

StoryAnalogy: Deriving Story-level Analogies from Large Language Models to Unlock Analogical Understanding

arXiv 2023

2023

Supervised Knowledge Makes Large Language Models Better In-context Learners

arXiv 2023

2023

LLM-enhanced Self-training for Cross-domain Constituency Parsing

arXiv 2023

2023

Improving (Dis)agreement Detection with Inductive Social Relation Information From Comment-Reply Interactions

arXiv 2023

2023

PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization

arXiv 2023

2023

RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation

arXiv 2023

2023

NaSGEC: a Multi-Domain Chinese Grammatical Error Correction Dataset from Native Speaker Texts

arXiv 2023

2023

Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity

arXiv 2023

2023

GLoRE: Evaluating Logical Reasoning of Large Language Models

arXiv 2023

2023

Chain-of-Symbol Prompting Elicits Planning in Large Langauge Models

arXiv 2023

2023

Non-autoregressive Text Editing with Copy-aware Latent Alignments

arXiv 2023

2023

Revisiting Cross-Lingual Summarization: A Corpus-based Study and A New Benchmark with Improved Annotation

arXiv 2023

2023

Dynamics of Instruction Tuning: Each Ability of Large Language Models Has Its Own Growth Pace

arXiv 2023

2023

TRAMS: Training-free Memory Selection for Long-range Language Modeling

arXiv 2023

2023

Graph Pre-training for AMR Parsing and Generation

ACL 2022 5

2022

UniSumm and SummZoo: Unified Model and Diverse Benchmark for Few-Shot Summarization

arXiv 2022

2022

GLUE-X: Evaluating Natural Language Understanding Models from an Out-of-distribution Generalization Perspective

arXiv 2022

2022

USB: A Unified Semi-supervised Learning Benchmark for Classification

arXiv 2022

2022

DialogSum: A Real-Life Scenario Dialogue Summarization Dataset

Findings (ACL) 2021 8

2021

Exploring the Efficacy of Automatically Generated Counterfactuals for Sentiment Analysis

ACL 2021 5

2021

NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

arXiv 2021

2021

MuTual: A Dataset for Multi-Turn Dialogue Reasoning

mutual-a-dataset-for-multi-turn-dialogue-1

2020

LogiQA: A Challenge Dataset for Machine Reading Comprehension with Logical Reasoning

arXiv 2020

2020

Natural Language Inference in Context -- Investigating Contextual Reasoning over Long Texts

arXiv 2020

2020

A Pilot Study for Chinese SQL Semantic Parsing

a-pilot-study-for-chinese-sql-semantic-1

2019

Affiliations

No known affiliations.

Frequent co-authors

10

from 84 papers