Yue Zhang
- Papers
- 84
Cite
Notes
Only stored in your browser.
Authored papers
84SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks
arXiv 2026
AutoFigure-Edit: Generating Editable Scientific Illustration
arXiv 2026
EPiC: Efficient Video Camera Control Learning with Precise Anchor-Video Guidance
arXiv 2025
AnchorWeave: World-Consistent Video Generation with Retrieved Local Spatial Memories
arXiv 2026
Detecting RLVR Training Data via Structural Convergence of Reasoning
arXiv 2026
FAAST: Forward-Only Associative Learning via Closed-Form Fast Weights for Test-Time Supervised Adaptation
arXiv 2026
AutoFigure: Generating and Refining Publication-Ready Scientific Illustrations
arXiv 2026
PhyMotion: Structured 3D Motion Reward for Physics-Grounded Human Video Generation
arXiv 2026
When and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning
arXiv 2026
AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios
arXiv 2026
OpenNovelty: An LLM-powered Agentic System for Verifiable Scholarly Novelty Assessment
arXiv 2026
V-Co: A Closer Look at Visual Representation Alignment via Co-Denoising
arXiv 2026
Think Longer to Explore Deeper: Learn to Explore In-Context via Length-Incentivized Reinforcement Learning
arXiv 2026
VisionCoach: Reinforcing Grounded Video Reasoning via Visual-Perception Prompting
arXiv 2026
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
arXiv 2025
SWE-SQL: Illuminating LLM Pathways to Solve User SQL Issues in Real-World Applications
arXiv 2025
Learning to Reason under Off-Policy Guidance
arXiv 2025
DeepReview: Improving LLM-based Paper Review with Human-like Deep Thinking Process
arXiv 2025
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
arXiv 2025
Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector
CVPR 2025 1
RewardAnything: Generalizable Principle-Following Reward Models
arXiv 2025
Error-Driven Scene Editing for 3D Grounding in Large Language Models
arXiv 2025
Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model
arXiv 2025
Lost in Literalism: How Supervised Training Shapes Translationese in LLMs
arXiv 2025
An Empirical Analysis of Uncertainty in Large Language Model Evaluations
arXiv 2025
Reasoning is All You Need for Video Generalization: A Counterfactual Benchmark with Sub-question Evaluation
arXiv 2025
Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving
arXiv 2025
Deep Research: A Systematic Survey
arXiv 2025
DeepScientist: Advancing Frontier-Pushing Scientific Findings Progressively
arXiv 2025
LMR-BENCH: Evaluating LLM Agent's Ability on Reproducing Language Modeling Research
arXiv 2025
LIMI: Less is More for Agency
arXiv 2025
TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them
arXiv 2025
Planning with Sketch-Guided Verification for Physics-Aware Video Generation
arXiv 2025
A Systematic Study of Code Obfuscation Against LLM-based Vulnerability Detection
arXiv 2025
MEXA: Towards General Multimodal Reasoning with Dynamic Multi-Expert Aggregation
arXiv 2025
AutoSurvey: Large Language Models Can Automatically Write Surveys
arXiv 2024
RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation
arXiv 2024
Direct Preference Optimization Using Sparse Feature-Level Constraints
arXiv 2024
RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation
arXiv 2024
Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement
arXiv 2024
Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models
arXiv 2024
Knowledge Conflicts for LLMs: A Survey
arXiv 2024
Personality Alignment of Large Language Models
arXiv 2024
A Unified Hallucination Mitigation Framework for Large Vision-Language Models
arXiv 2024
See What LLMs Cannot Answer: A Self-Challenge Framework for Uncovering LLM Weaknesses
arXiv 2024
LexMatcher: Dictionary-centric Data Collection for LLM-based Machine Translation
arXiv 2024
ECon: On the Detection and Resolution of Evidence Conflicts
arXiv 2024
MuseTalk: Real-Time High-Fidelity Video Dubbing via Spatio-Temporal Sampling
arXiv 2024
NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens
arXiv 2024
Can Language Models Learn to Skip Steps?
arXiv 2024
FreeEval: A Modular Framework for Trustworthy and Efficient Evaluation of Large Language Models
arXiv 2024
DocFusion: A Unified Framework for Document Parsing Tasks
arXiv 2024
Fast-DetectGPT: Efficient Zero-Shot Detection of Machine-Generated Text via Conditional Probability Curvature
arXiv 2023
A Survey on Evaluation of Large Language Models
arXiv 2023
Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models
arXiv 2023
MAGE: Machine-generated Text Detection in the Wild
arXiv 2023
Evaluating the Logical Reasoning Ability of ChatGPT and GPT-4
arXiv 2023
LogiCoT: Logical Chain-of-Thought Instruction-Tuning
arXiv 2023
Understanding In-Context Learning from Repetitions
arXiv 2023
StoryAnalogy: Deriving Story-level Analogies from Large Language Models to Unlock Analogical Understanding
arXiv 2023
Supervised Knowledge Makes Large Language Models Better In-context Learners
arXiv 2023
LLM-enhanced Self-training for Cross-domain Constituency Parsing
arXiv 2023
Improving (Dis)agreement Detection with Inductive Social Relation Information From Comment-Reply Interactions
arXiv 2023
PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization
arXiv 2023
RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation
arXiv 2023
NaSGEC: a Multi-Domain Chinese Grammatical Error Correction Dataset from Native Speaker Texts
arXiv 2023
Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity
arXiv 2023
GLoRE: Evaluating Logical Reasoning of Large Language Models
arXiv 2023
Chain-of-Symbol Prompting Elicits Planning in Large Langauge Models
arXiv 2023
Non-autoregressive Text Editing with Copy-aware Latent Alignments
arXiv 2023
Revisiting Cross-Lingual Summarization: A Corpus-based Study and A New Benchmark with Improved Annotation
arXiv 2023
Dynamics of Instruction Tuning: Each Ability of Large Language Models Has Its Own Growth Pace
arXiv 2023
TRAMS: Training-free Memory Selection for Long-range Language Modeling
arXiv 2023
Graph Pre-training for AMR Parsing and Generation
ACL 2022 5
UniSumm and SummZoo: Unified Model and Diverse Benchmark for Few-Shot Summarization
arXiv 2022
GLUE-X: Evaluating Natural Language Understanding Models from an Out-of-distribution Generalization Perspective
arXiv 2022
USB: A Unified Semi-supervised Learning Benchmark for Classification
arXiv 2022
DialogSum: A Real-Life Scenario Dialogue Summarization Dataset
Findings (ACL) 2021 8
Exploring the Efficacy of Automatically Generated Counterfactuals for Sentiment Analysis
ACL 2021 5
NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation
arXiv 2021
MuTual: A Dataset for Multi-Turn Dialogue Reasoning
mutual-a-dataset-for-multi-turn-dialogue-1
LogiQA: A Challenge Dataset for Machine Reading Comprehension with Logical Reasoning
arXiv 2020
Natural Language Inference in Context -- Investigating Contextual Reasoning over Long Texts
arXiv 2020
A Pilot Study for Chinese SQL Semantic Parsing
a-pilot-study-for-chinese-sql-semantic-1
Affiliations
Frequent co-authors
10from 84 papers