Haodong Duan
- Papers
- 35
Cite
Notes
Only stored in your browser.
Authored papers
35WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation
arXiv 2026
RISE-Video: Can Video Generators Decode Implicit World Rules?
arXiv 2026
Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development
arXiv 2026
VideoZeroBench: Probing the Limits of Video MLLMs with Spatio-Temporal Evidence Verification
arXiv 2026
A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers
arXiv 2025
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
arXiv 2025
Visual Agentic Reinforcement Fine-Tuning
arXiv 2025
OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
CVPR 2025 1
Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing
arXiv 2025
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
arXiv 2025
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference
arXiv 2025
Redundancy Principles for MLLMs Benchmarks
arXiv 2025
ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning
arXiv 2025
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
arXiv 2025
Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning
arXiv 2025
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents
arXiv 2025
MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization
arXiv 2025
SPARK: Synergistic Policy And Reward Co-Evolving Framework
arXiv 2025
Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement
arXiv 2025
BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning
arXiv 2025
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM
arXiv 2025
Think Visually, Reason Textually: Vision-Language Synergy in ARC
arXiv 2025
VideoRoPE: What Makes for Good Video Rotary Position Embedding?
arXiv 2025
MM-IFEngine: Towards Multimodal Instruction Following
arXiv 2025
LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning?
arXiv 2025
SciEvalKit: An Open-source Evaluation Toolkit for Scientific General Intelligence
arXiv 2025
NeedleBench: Can LLMs Do Retrieval and Reasoning in Information-Dense Context?
arXiv 2024
Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks
arXiv 2024
Are We on the Right Way for Evaluating Large Vision-Language Models?
arXiv 2024
MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning
arXiv 2024
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI
arXiv 2024
ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs
arXiv 2024
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
arXiv 2024
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
arXiv 2024
BotChat: Evaluating LLMs' Capabilities of Having Multi-Turn Dialogues
arXiv 2023
Affiliations
Frequent co-authors
10from 35 papers