Yu Gu

Magma: A Foundation Model for Multimodal AI Agents

CVPR 2025 1

Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts

arXiv 2025

SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills

arXiv 2025

X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains

arXiv 2025

HIPPO: Enhancing the Table Understanding Capability of Large Language Models through Hybrid-Modal Preference Optimization

arXiv 2025

Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge

arXiv 2025

RankCoT: Refining Knowledge for Retrieval-Augmented Generation through Ranking Chain-of-Thoughts

arXiv 2025

LLM-QE: Improving Query Expansion by Aligning Large Language Models with Ranking Preferences

arXiv 2025

Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents

arXiv 2024

VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents

arXiv 2024

Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation

arXiv 2024

Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments

arXiv 2024

STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment

arXiv 2024

Mind2Web: Towards a Generalist Agent for the Web

mind2web-towards-a-generalist-agent-for-the

Reviving the Context: Camera Trap Species Classification as Link Prediction on Multimodal Knowledge Graphs

arXiv 2023

AgentBench: Evaluating LLMs as Agents

arXiv 2023

KoLA: Carefully Benchmarking World Knowledge of Large Language Models

arXiv 2023