Cite
Notes
Only stored in your browser.
Attribution
When Vision Speaks for Sound
arXiv 2026
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
arXiv 2024
Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents
from 3 papers
Boyu Gou
Boyuan Zheng
Cheng Chang
Huan Sun
Peng Qi
Yu Su
Kai Zhang
Muhao Chen
Rui Cai
Ruohan Wang