Step-GUI Technical Report

Topics

3

Computer Use Agents Image Understanding Language Modeling

Abstract

Recent advances in multimodal large language models unlock unprecedented opportunities for GUI automation. However, a fundamental challenge remains: how to efficiently acquire high-quality training data while maintaining annotation reliability? We introduce a self-evolving training pipeline powered by the Calibrated Step Reward System, which converts model-generated trajectories into reliable training signals through trajectory-level calibration, achieving >90% annotation accuracy with 10-100x lower cost. Leveraging this pipeline, we introduce Step-GUI, a family of models (4B/8B) that achieves state-of-the-art GUI performance (8B: 80.2% AndroidWorld, 48.5% OSWorld, 62.6% ScreenShot-Pro) while maintaining robust general capabilities. As GUI agent capabilities improve, practical deployment demands standardized interfaces across heterogeneous devices while protecting user privacy. To this end, we propose GUI-MCP, the first Model Context Protocol for GUI automation with hierarchical architecture that combines low-level atomic operations and high-level task delegation to local specialist models, enabling high-privacy execution where sensitive data stays on-device. Finally, to assess whether agents can handle authentic everyday usage, we introduce AndroidDaily, a benchmark grounded in real-world mobile usage patterns with 3146 static actions and 235 end-to-end tasks across high-frequency daily scenarios (8B: static 89.91%, end-to-end 52.50%). Our work advances the development of practical GUI agents and demonstrates strong potential for real-world deployment in everyday digital interactions.

Authors

96

Daxin Jiang Xin Liu Jing Li Hao Wu Min Xu Hang Li Shuang Luo Na Wang Xin Huang Liang Zhao Guopeng Li Zheng Ge Yibo Zhu Binxing Jiao Xiangyu Zhang Xin Zhou Guodong Liu Jingyang Zhang Jia Wang Dong Li Hongming Chen Xu Zhou Qiong Gao Lei Lei Wen Sun Ning Wang Brian Li JingJing Xie Kaijun Tan Kang An Lieyu Shi Liguo Tan Mengqiang Ren Shiliang Yang Xiaojia Liu Xuanti Feng Xuedan Cai Yeqing Shen Yingxiu Zhao Zejia Weng Zhiguo Huang Manjiao Liu Yukang Shi Nan Wu Ziyang Meng Zhonghao Yan Mei Chen Shuli Gao Xuan Wen Liying Shi Haolong Yan Yineng Deng Chenyang Li Junhao Huang Jin Gao Xingbin Liu Zhirui Wang Xiaojie Hou Zhimin Fan Mi Yang Mengmeng Duan Danxun Liang Hang Cheng Jie Dong Renjie Yu Shunshan Li Yiting Dai Yingdan Liang Zelin Chen Chengxu Yan Chunqin Xu Fengqiong Xiao Guanghao Fan Guozhen Peng Hongbing Li Jianyong Li Jiaju Ren Jiayu Yuan Jianpeng Yin Kai Cao Mao Luo Mingxin Wan Peiyao Ma Qingzhou Zhang Qiao Wang Qinlin Zeng Qiongyao Li Shangwu Zhong Shaofan Liu Shisi Gao Xianwei Zhu Xin Liang Yunfang Xu Yuqing Zeng Yixun Zhang Zhuoyu Wang