Tianbao Xie

PhD student at the University of Hong Kong (XLang Lab); lead author of OSWorld, the standard benchmark for computer-use / GUI agents.

Role: grad-student
Currently at: HKU XLANG Lab
Twitter: twitter.com/TianbaoX
GitHub: github.com/timothyxxx
Scholar: scholar.google.com/citations
Papers: 25

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: scholar.google.com/citations

Attribution policy →

25papers·2eval contribs

Authored papers

25

CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents

arXiv 2026

RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System

arXiv 2026

OSWorld-Verified: A Cleaner, More Reliable Computer-Use Benchmark

blog

Qwen2.5-VL Technical Report

arXiv 2025

Qwen3-VL Technical Report

arXiv 2025

Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis

arXiv 2025

xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations

arXiv 2025

MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents

arXiv 2025

OpenCUA: Open Foundations for Computer-Use Agents

arXiv 2025

Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents

arXiv 2025

ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows

arXiv 2025

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

NeurIPS

Cradle: Empowering Foundation Agents Towards General Computer Control

arXiv 2024

Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?

arXiv 2024

AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant

arXiv 2024

Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

arXiv 2024

OpenAgents: An Open Platform for Language Agents in the Wild

arXiv 2023

Text2Reward: Reward Shaping with Language Models for Reinforcement Learning

arXiv 2023

Mobile-Env: Building Qualified Evaluation Benchmarks for LLM-GUI Interaction

arXiv 2023

Lemur: Harmonizing Natural Language and Code for Language Agents

arXiv 2023

UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models

arXiv 2022

In-Context Learning for Few-Shot Dialogue State Tracking

arXiv 2022

Binding Language Models in Symbolic Languages

arXiv 2022

A Survey on Spoken Language Understanding: Recent Advances and New Frontiers

arXiv 2021

NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

arXiv 2021

Eval contributions

2

OSWorld-Verified

XLANG Lab

Cleaned, human-validated subset of OSWorld tasks designed for stable cross-lab comparison of computer-use agents.

ActiveComputer UsePlanningTool CallingAgentic

OSWorld

XLANG Lab

369 computer-use tasks across Ubuntu, Windows, and macOS environments testing whether agents can operate a real desktop via screenshots and mouse/keyboard.

ActiveComputer UsePlanningTool CallingAgentic

Affiliations

Currently at

grad-student · university lab

Frequent co-authors

10

from 25 papers

Tao Yu

professor

14 shared papers

Yiheng Xu

researcher

10 shared papers

Caiming Xiong

researcher

9 shared papers

Danyang Zhang

researcher

6 shared papers

Jixuan Chen

researcher

6 shared papers

Victor Zhong

researcher

6 shared papers

Ruisheng Cao

researcher

5 shared papers

Yitao Liu

researcher

5 shared papers

Zhoujun Cheng

researcher

5 shared papers

Dongchan Shin

researcher

4 shared papers