0

Androidworld

AndroidWorld benchmark for evaluating autonomous agents on real Android apps with 116 tasks across 20 apps

Domain
rl-env
License
apache-2.0
Published
Mar 2026

Cite

Notes

Only stored in your browser.

Attribution

Leaderboard scores
prime-hub
Attribution policy →

Top score 26.7% by GPT-4.1 - 1 model reporting (1 frontier)

Top models

1
AndroidworldBar chart with 1 bar. Highest value: GPT-4.1 at 26.7.
1 model

Related tools

1
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

FAQ

What is Androidworld?
AndroidWorld benchmark for evaluating autonomous agents on real Android apps with 116 tasks across 20 apps
What is the current top score on Androidworld?
The top reported score is 26.7% by GPT-4.1, across 1 model reporting (1 from frontier labs).
How can a model improve its Androidworld score?
Tools linked to Androidworld on Sophon include Androidworld RL Env (Prime Community) - RL environments, datasets, and scaffolds that target this eval.
What license is Androidworld under?
Androidworld is available under apache-2.0.