0

Tao Yu

Assistant professor at HKU and head of the XLang Lab; advisor on OSWorld, Spider, OpenAgents, Aguvis, and OpenCUA.

Role
professor
Currently at
HKU XLANG Lab
Papers
50

Cite

Notes

Only stored in your browser.

50papers·1eval contribs

Authored papers

50

CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents

arXiv 2026

2026

Monocular Mesh Recovery and Body Measurement of Female Saanen Goats

arXiv 2026

2026

Kimi K2.5: Visual Agentic Intelligence

arXiv 2026

2026

OSWorld-Verified: A Cleaner, More Reliable Computer-Use Benchmark

blog

2025

Aligning Multimodal LLM with Human Preference: A Survey

arXiv 2025

2025

Kimi-VL Technical Report

arXiv 2025

2025

Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis

arXiv 2025

2025

ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration

arXiv 2025

2025

OpenCUA: Open Foundations for Computer-Use Agents

arXiv 2025

2025

Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents

arXiv 2025

2025

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

NeurIPS

2024

OS-Copilot: Towards Generalist Computer Agents with Self-Improvement

arXiv 2024

2024

Generative Representational Instruction Tuning

arXiv 2024

2024

Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?

arXiv 2024

2024

EVOR: Evolving Retrieval for Code Generation

arXiv 2024

2024

VLKEB: A Large Vision-Language Model Knowledge Editing Benchmark

arXiv 2024

2024

Attacking Vision-Language Computer Agents via Pop-ups

arXiv 2024

2024

Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

arXiv 2024

2024

Fast Segment Anything

arXiv 2023

2023

Inpaint Anything: Segment Anything Meets Image Inpainting

arXiv 2023

2023

Fluctuation-based Adaptive Structured Pruning for Large Language Models

arXiv 2023

2023

StyleAvatar: Real-time Photo-realistic Portrait Avatar from a Single Video

arXiv 2023

2023

Text2Reward: Reward Shaping with Language Models for Reinforcement Learning

arXiv 2023

2023

OmniSeg3D: Omniversal 3D Segmentation via Hierarchical Contrastive Learning

CVPR 2024 1

2023

Compositional Exemplars for In-context Learning

arXiv 2023

2023

Batch Prompting: Efficient Inference with Large Language Model APIs

arXiv 2023

2023

ZBS: Zero-shot Background Subtraction via Instance-level Background Modeling and Foreground Selection

CVPR 2023 1

2023

Shadow Cones: A Generalized Framework for Partial Order Embeddings

arXiv 2023

2023

OpenAgents: An Open Platform for Language Agents in the Wild

arXiv 2023

2023

Generating Data for Symbolic Language with Large Language Models

arXiv 2023

2023

Lemur: Harmonizing Natural Language and Code for Language Agents

arXiv 2023

2023

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

TMLR

2022

FaceVerse: a Fine-grained and Detail-controllable 3D Face Morphable Model from a Hybrid Dataset

CVPR 2022 1

2022

DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation

arXiv 2022

2022

UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models

arXiv 2022

2022

FOLIO: Natural Language Reasoning with First-Order Logic

arXiv 2022

2022

Selective Annotation Makes Language Models Better Few-Shot Learners

arXiv 2022

2022

Coder Reviewer Reranking for Code Generation

arXiv 2022

2022

In-Context Learning for Few-Shot Dialogue State Tracking

arXiv 2022

2022

One Embedder, Any Task: Instruction-Finetuned Text Embeddings

arXiv 2022

2022

ZeroGen: Efficient Zero-shot Learning via Dataset Generation

arXiv 2022

2022

Binding Language Models in Symbolic Languages

arXiv 2022

2022

SummerTime: Text Summarization Toolkit for Non-experts

EMNLP (ACL) 2021 11

2021

QMSum: A New Benchmark for Query-based Multi-domain Meeting Summarization

NAACL 2021 4

2021

GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing

arXiv 2020

2020

Simplifying Graph Convolutional Networks

arXiv 2019

2019

SParC: Cross-Domain Semantic Parsing in Context

sparc-cross-domain-semantic-parsing-in-1

2019

Region Normalization for Image Inpainting

arXiv 2019

2019

TypeSQL: Knowledge-based Type-Aware Neural Text-to-SQL Generation

arXiv 2018

2018

Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task

spider-a-large-scale-human-labeled-dataset-1

2018

Eval contributions

1

Affiliations

Currently at

HKU XLANG Lab

professor · university lab

Frequent co-authors

10

from 50 papers