Yu Su
- Papers
- 46
Cite
Notes
Only stored in your browser.
Authored papers
46QUEST: Training Frontier Deep Research Agents with Fully Synthetic Tasks
arXiv 2026
Automatic Image-Level Morphological Trait Annotation for Organismal Images
arXiv 2026
When Benign Inputs Lead to Severe Harms: Eliciting Unsafe Unintended Behaviors of Computer-Use Agents
arXiv 2026
From RAG to Memory: Non-Parametric Continual Learning for Large Language Models
arXiv 2025
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems
arXiv 2025
An Illusion of Progress? Assessing the Current State of Web Agents
arXiv 2025
ARM: Adaptive Reasoning Model
arXiv 2025
Prompt-CAM: A Simpler Interpretable Transformer for Fine-Grained Analysis
arXiv 2025
SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills
arXiv 2025
On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective
arXiv 2025
BioCLIP 2: Emergent Properties from Scaling Hierarchical Contrastive Learning
arXiv 2025
RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments
arXiv 2025
MICE for CATs: Model-Internal Confidence Estimation for Calibrating Agents with Tools
arXiv 2025
Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge
arXiv 2025
Is Extending Modality The Right Path Towards Omni-Modality?
arXiv 2025
Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents
arXiv 2025
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
arXiv 2024
TravelPlanner: A Benchmark for Real-World Planning with Language Agents
arXiv 2024
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics
CVPR 2025 1
GPT-4V(ision) is a Generalist Web Agent, if Grounded
arXiv 2024
LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error
arXiv 2024
ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery
arXiv 2024
When is Tree Search Useful for LLM Planning? It Depends on the Discriminator
arXiv 2024
Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments
arXiv 2024
VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images
arXiv 2024
A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents
arXiv 2024
Deductive Beam Search: Decoding Deducible Rationale for Chain-of-Thought Reasoning
arXiv 2024
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
arXiv 2024
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
arXiv 2024
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
arXiv 2024
Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents
arXiv 2024
Mind2Web: Towards a Generalist Agent for the Web
mind2web-towards-a-generalist-agent-for-the
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
CVPR 2024 1
Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts
arXiv 2023
A Simple Interpretable Transformer for Fine-Grained Image Classification and Analysis
arXiv 2023
Automatic Evaluation of Attribution by Large Language Models
arXiv 2023
Aligning Instruction Tasks Unlocks Large Language Models as Zero-Shot Relation Extractors
arXiv 2023
Reviving the Context: Camera Trap Species Classification as Link Prediction on Multimodal Knowledge Graphs
arXiv 2023
AgentBench: Evaluating LLMs as Agents
arXiv 2023
MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing
NeurIPS 2023 11
BioCLIP: A Vision Foundation Model for the Tree of Life
CVPR 2024 1
Biomedical Language Models are Robust to Sub-optimal Tokenization
arXiv 2023
Don't Generate, Discriminate: A Proposal for Grounding Language Models to Real-World Environments
arXiv 2022
A Retrieve-and-Read Framework for Knowledge Graph Link Prediction
arXiv 2022
A Systematic Investigation of KB-Text Embedding Alignment at Scale
ACL 2021 5
Logical Natural Language Generation from Open-Domain Tables
logical-natural-language-generation-from-open-1
Affiliations
Frequent co-authors
10from 46 papers