Wentao Zhang
- Papers
- 76
Cite
Notes
Only stored in your browser.
Authored papers
76DataFlex: A Unified Framework for Data-Centric Dynamic Training of Large Language Models
arXiv 2026
AgentOrchestra: Orchestrating Multi-Agent Intelligence with the Tool-Environment-Agent(TEA) Protocol
arXiv 2025
MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding
arXiv 2026
One-Eval: An Agentic System for Automated and Traceable LLM Evaluation
arXiv 2026
LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning
arXiv 2026
Towards Automated Kernel Generation in the Era of LLMs
arXiv 2026
OpenWorldLib: A Unified Codebase and Definition of Advanced World Models
arXiv 2026
ChartVerse: Scaling Chart Reasoning via Reliable Programmatic Synthesis from Scratch
arXiv 2026
CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence
arXiv 2026
PEARL: Personalized Streaming Video Understanding Model
arXiv 2026
CoF-T2I: Video Models as Pure Visual Reasoners for Text-to-Image Generation
arXiv 2026
GENIUS: Generative Fluid Intelligence Evaluation Suite
arXiv 2026
MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods
arXiv 2026
Semantic Routing: Exploring Multi-Layer LLM Feature Weighting for Diffusion Transformers
arXiv 2026
Agri-R1: Agricultural Reasoning for Disease Diagnosis via Automated-Synthesis and Reinforcement Learning
arXiv 2026
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing
arXiv 2025
MemOS: A Memory OS for AI System
arXiv 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
preprint
VABench: A Comprehensive Benchmark for Audio-Video Generation
arXiv 2025
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
arXiv 2025
Any2AnyTryon: Leveraging Adaptive Position Embeddings for Versatile Virtual Clothing Tasks
ICCV 2025
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI
arXiv 2025
WideRange4D: Enabling High-Quality 4D Reconstruction with Wide-Range Movements and Scenes
arXiv 2025
MM-Verify: Enhancing Multimodal Reasoning with Chain-of-Thought Verification
arXiv 2025
Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions
arXiv 2025
Multi-Step Visual Reasoning with Visual Tokens Scaling and Verification
arXiv 2025
FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding
arXiv 2025
LoVR: A Benchmark for Long Video Retrieval in Multimodal Contexts
arXiv 2025
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
arXiv 2025
Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling
arXiv 2025
Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation
arXiv 2025
MorphoBench: A Benchmark with Difficulty Adaptive to Model Reasoning
arXiv 2025
Multimodal Reasoning for Science: Technical Report and 1st Place Solution to the ICML 2025 SeePhys Challenge
arXiv 2025
On Path to Multimodal Historical Reasoning: HistBench and HistAgent
arXiv 2025
Interactive Training: Feedback-Driven Neural Network Optimization
arXiv 2025
VCU-Bridge: Hierarchical Visual Connotation Understanding via Semantic Bridging
arXiv 2025
Taming LLMs by Scaling Learning Rates with Gradient Grouping
arXiv 2025
MRAMG-Bench: A Comprehensive Benchmark for Advancing Multimodal Retrieval-Augmented Multimodal Generation
arXiv 2025
Pixels, Patterns, but No Poetry: To See The World like Humans
arXiv 2025
Let's Verify Math Questions Step by Step
arXiv 2025
FlipVQA-Miner: Cross-Page Visual Question-Answer Mining from Textbooks
arXiv 2025
Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning
arXiv 2025
DOCR-Inspector: Fine-Grained and Automated Evaluation of Document Parsing with VLM
arXiv 2025
From Uniform to Heterogeneous: Tailoring Policy Optimization to Every Token's Nature
arXiv 2025
RARE: Retrieval-Augmented Reasoning Modeling
arXiv 2025
Baichuan-Omni-1.5 Technical Report
arXiv 2025
HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation
arXiv 2025
SAS-Bench: A Fine-Grained Benchmark for Evaluating Short Answer Scoring with Large Language Models
arXiv 2025
VADE: Variance-Aware Dynamic Sampling via Online Sample-Level Difficulty Estimation for Multimodal RL
arXiv 2025
DeepSeek-V3 Technical Report
arXiv 2024
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
arXiv 2024
Cradle: Empowering Foundation Agents Towards General Computer Control
arXiv 2024
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
arXiv 2024
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
arXiv 2024
CFBench: A Comprehensive Constraints-Following Benchmark for LLMs
arXiv 2024
VecCity: A Taxonomy-guided Library for Map Entity Representation Learning
arXiv 2024
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models
arXiv 2024
Can LLMs be Good Graph Judge for Knowledge Graph Construction?
arXiv 2024
Synth-Empathy: Towards High-Quality Synthetic Empathy Data
arXiv 2024
AnyTaskTune: Advanced Domain-Specific Solutions through Task-Fine-Tuning
arXiv 2024
BEATS: Optimizing LLM Mathematical Capabilities with BackVerify and Adaptive Disambiguate based Efficient Tree Search
arXiv 2024
MathScape: Evaluating MLLMs in multimodal Math Scenarios through a Hierarchical Benchmark
arXiv 2024
Semantic Score Distillation Sampling for Compositional Text-to-3D Generation
arXiv 2024
Consistency Flow Matching: Defining Straight Flows with Velocity Consistency
arXiv 2024
OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation
ICCV 2025
QAEncoder: Towards Aligned Representation Learning in Question Answering System
arXiv 2024
MC-LLaVA: Multi-Concept Personalized Vision-Language Model
arXiv 2024
EditWorld: Simulating World Dynamics for Instruction-Following Image Editing
arXiv 2024
SynthVLM: High-Efficiency and High-Quality Synthetic Data for Vision Language Models
arXiv 2024
Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining
arXiv 2024
SysBench: Can Large Language Models Follow System Messages?
arXiv 2024
Facilitating Multi-turn Function Calling for LLMs via Compositional Instruction Tuning
arXiv 2024
Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis
arXiv 2024
VQGraph: Rethinking Graph Representation Space for Bridging GNNs and MLPs
arXiv 2023
Diffusion Models: A Comprehensive Survey of Methods and Applications
arXiv 2022
Evaluating Deep Graph Neural Networks
evaluating-deep-graph-neural-networks-1
Affiliations
Frequent co-authors
10from 76 papers