0

GAIA (General AI Assistants)

Active

466 real-world questions requiring tool use, multi-step reasoning, and web browsing - easy for humans (~92%) but hard for AI assistants.

Open
Domain
agentic
Format
HF Dataset
Size
466 tasks
License
Apache-2.0
Published
Nov 2023
Updates
Live
Notable for
The reference leaderboard for general-purpose AI assistants — Meta-FAIR's GAIA benchmark with HF as host, used by agent frameworks including AutoGPT, LangChain, Trase, and HuggingFace's smolagents.

Cite

Notes

Only stored in your browser.

Where it's ranked

1

Related tools

1
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

Papers

2

Contributors

3

FAQ

What is GAIA (General AI Assistants)?
466 real-world questions requiring tool use, multi-step reasoning, and web browsing - easy for humans (~92%) but hard for AI assistants.
What capabilities does GAIA (General AI Assistants) test?
GAIA (General AI Assistants) evaluates tool calling, browser use, planning, multi turn dialog.
How can a model improve its GAIA (General AI Assistants) score?
Tools linked to GAIA (General AI Assistants) on Sophon include GAIA RL Env (Browserbase) - RL environments, datasets, and scaffolds that target this eval.
What license is GAIA (General AI Assistants) under?
GAIA (General AI Assistants) is available under Apache-2.0.