Yu Su

Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

arXiv 2025

An Illusion of Progress? Assessing the Current State of Web Agents

arXiv 2025

ARM: Adaptive Reasoning Model

arXiv 2025

Prompt-CAM: A Simpler Interpretable Transformer for Fine-Grained Analysis

arXiv 2025

SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills

arXiv 2025

On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective

arXiv 2025

BioCLIP 2: Emergent Properties from Scaling Hierarchical Contrastive Learning

arXiv 2025

RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments

arXiv 2025

MICE for CATs: Model-Internal Confidence Estimation for Calibrating Agents with Tools

arXiv 2025

Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge

arXiv 2025

Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents

arXiv 2025

Is Extending Modality The Right Path Towards Omni-Modality?

arXiv 2025

Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents

arXiv 2024

TravelPlanner: A Benchmark for Real-World Planning with Language Agents

arXiv 2024

RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics

CVPR 2025 1

GPT-4V(ision) is a Generalist Web Agent, if Grounded

arXiv 2024

Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization

arXiv 2024

VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents

arXiv 2024

MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions

arXiv 2024

LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error

arXiv 2024

ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery

arXiv 2024

VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images

arXiv 2024

A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents

arXiv 2024

Deductive Beam Search: Decoding Deducible Rationale for Chain-of-Thought Reasoning

arXiv 2024

When is Tree Search Useful for LLM Planning? It Depends on the Discriminator

arXiv 2024

Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments

arXiv 2024

Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents

arXiv 2024

Mind2Web: Towards a Generalist Agent for the Web

mind2web-towards-a-generalist-agent-for-the

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

CVPR 2024 1

AgentBench: Evaluating LLMs as Agents

arXiv 2023

MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing

NeurIPS 2023 11

BioCLIP: A Vision Foundation Model for the Tree of Life

CVPR 2024 1

Reviving the Context: Camera Trap Species Classification as Link Prediction on Multimodal Knowledge Graphs

arXiv 2023

Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts

arXiv 2023

A Simple Interpretable Transformer for Fine-Grained Image Classification and Analysis

arXiv 2023

Automatic Evaluation of Attribution by Large Language Models

arXiv 2023

Aligning Instruction Tasks Unlocks Large Language Models as Zero-Shot Relation Extractors

arXiv 2023

Biomedical Language Models are Robust to Sub-optimal Tokenization

arXiv 2023