0

ALFWorld

Active

Embodied household-task benchmark that aligns TextWorld text commands with ALFRED 3D scenes, testing whether agents can transfer from abstract text policies to grounded execution.

Publisher
Microsoft
Domain
agentic
Format
Custom
Size
3,553 tasks across 6 task types tasks
License
MIT
Published
Oct 2020
Notable for
Benchmark for evaluating embodied, planning and tool calling in the agentic domain.

Cite

Notes

Only stored in your browser.

Related tools

1
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

Papers

2

FAQ

What is ALFWorld?
Embodied household-task benchmark that aligns TextWorld text commands with ALFRED 3D scenes, testing whether agents can transfer from abstract text policies to grounded execution.
What capabilities does ALFWorld test?
ALFWorld evaluates embodied, planning, tool calling, common sense, multi turn dialog.
How can a model improve its ALFWorld score?
Tools linked to ALFWorld on Sophon include ALFWorld - RL environments, datasets, and scaffolds that target this eval.
What license is ALFWorld under?
ALFWorld is available under MIT.