Difei Gao

Papers: 13

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: Semantic Scholar

Attribution policy →

13papers

Authored papers

ShowUI-Aloha: Human-Taught GUI Agent

arXiv 2026

2026

CutVerse: A Compositional GUI Agents Benchmark for Media Post-Production Editing

arXiv 2026

2026

WorldGUI: An Interactive Benchmark for Desktop GUI Automation from Any Starting Point

arXiv 2025

2025

Factorized Learning for Temporally Grounded Video-Language Models

arXiv 2025

2025

The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use

arXiv 2024

2024

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

CVPR 2025 1

2024

LOVA3: Learning to Visual Question Answering, Asking and Assessment

arXiv 2024

2024

Learning Video Context as Interleaved Multimodal Sequences

arXiv 2024

2024

Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation

arXiv 2023

2023

UniVTG: Towards Unified Video-Language Temporal Grounding

ICCV 2023 1

2023

CVPR 2023 Text Guided Video Editing Competition

arXiv 2023

2023

Egocentric Video-Language Pretraining

arXiv 2022

2022

Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task

arXiv 2022

2022

Affiliations

No known affiliations.

Frequent co-authors

from 13 papers

Mike Zheng Shou

Kevin Qinghong Lin

Jay Zhangjie Wu

Alex Jinpeng Wang

Henry Hengyuan Zhao

Joya Chen

Pengchuan Zhang

Rui Yan

Wei Liu

Xiangwu Guo