Difei Gao
- Papers
- 13
Cite
Notes
Only stored in your browser.
Authored papers
13ShowUI-Aloha: Human-Taught GUI Agent
arXiv 2026
CutVerse: A Compositional GUI Agents Benchmark for Media Post-Production Editing
arXiv 2026
Factorized Learning for Temporally Grounded Video-Language Models
arXiv 2025
WorldGUI: An Interactive Benchmark for Desktop GUI Automation from Any Starting Point
arXiv 2025
The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use
arXiv 2024
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
CVPR 2025 1
LOVA3: Learning to Visual Question Answering, Asking and Assessment
arXiv 2024
Learning Video Context as Interleaved Multimodal Sequences
arXiv 2024
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
arXiv 2023
UniVTG: Towards Unified Video-Language Temporal Grounding
ICCV 2023 1
CVPR 2023 Text Guided Video Editing Competition
arXiv 2023
Egocentric Video-Language Pretraining
arXiv 2022
Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task
arXiv 2022
Affiliations
Frequent co-authors
10from 13 papers