Haoqin Tu

ClinSeekAgent: Automating Multimodal Evidence Seeking for Agentic Clinical Reasoning

arXiv 2026

MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild

arXiv 2026

From Seeing to Thinking: Decoupling Perception and Reasoning Improves Post-Training of Vision-Language Models

arXiv 2026

Target-Oriented Pretraining Data Selection via Neuron-Activated Graph

arXiv 2026

VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI Automation

arXiv 2026

Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw

arXiv 2026

Chasing the Public Score: User Pressure and Evaluation Exploitation in Coding Agent Workflows

arXiv 2026

AHELM: A Holistic Evaluation of Audio-Language Models

arXiv 2025

SpatialThinker: Reinforcing Scene Graph-Grounded Spatial Reasoning via Dense Rewards

arXiv 2025

OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning

ICCV 2025

Language Models Can See Better: Visual Contrastive Decoding For LLM Multimodal Reasoning

arXiv 2025

How Far Are We From AGI: Are LLMs All We Need?

arXiv 2024

MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

arXiv 2024

What If We Recaption Billions of Web Images with LLaMA-3?

arXiv 2024

Autoregressive Pretraining with Mamba in Vision

arXiv 2024