ALFWorld
Active
Embodied household-task benchmark that aligns TextWorld text commands with ALFRED 3D scenes, testing whether agents can transfer from abstract text policies to grounded execution.
- Publisher
- Microsoft
- Domain
- agentic
- Format
- Custom
- Size
- 3,553 tasks across 6 task types tasks
- License
- MIT
- Published
- Oct 2020
- Notable for
- Benchmark for evaluating embodied, planning and tool calling in the agentic domain.
- Canonical
- alfworld.github.io
Cite
Notes
Only stored in your browser.
Related tools
1Implementations, trainers, datasets and scaffolds linked to this eval.
Papers
2FAQ
- What is ALFWorld?
- Embodied household-task benchmark that aligns TextWorld text commands with ALFRED 3D scenes, testing whether agents can transfer from abstract text policies to grounded execution.
- What capabilities does ALFWorld test?
- ALFWorld evaluates embodied, planning, tool calling, common sense, multi turn dialog.
- How can a model improve its ALFWorld score?
- Tools linked to ALFWorld on Sophon include ALFWorld - RL environments, datasets, and scaffolds that target this eval.
- What license is ALFWorld under?
- ALFWorld is available under MIT.