GAIA (General AI Assistants)
Active
466 real-world questions requiring tool use, multi-step reasoning, and web browsing - easy for humans (~92%) but hard for AI assistants.
- Publisher
- Meta FAIR (Fundamental AI Research)
- Capabilities
- Tool CallingBrowser UsePlanningMulti Turn Dialog
- Domain
- agentic
- Format
- HF Dataset
- Size
- 466 tasks
- License
- Apache-2.0
- Published
- Nov 2023
- Updates
- Live
- Notable for
- The reference leaderboard for general-purpose AI assistants — Meta-FAIR's GAIA benchmark with HF as host, used by agent frameworks including AutoGPT, LangChain, Trase, and HuggingFace's smolagents.
- Canonical
- huggingface.co/gaia-benchmark
- Official leaderboard
- huggingface.co/spaces/gaia-benchmark/leaderboard
Cite
Notes
Only stored in your browser.
Where it's ranked
1Related tools
1Implementations, trainers, datasets and scaffolds linked to this eval.
Papers
2Contributors
3FAQ
- What is GAIA (General AI Assistants)?
- 466 real-world questions requiring tool use, multi-step reasoning, and web browsing - easy for humans (~92%) but hard for AI assistants.
- What capabilities does GAIA (General AI Assistants) test?
- GAIA (General AI Assistants) evaluates tool calling, browser use, planning, multi turn dialog.
- How can a model improve its GAIA (General AI Assistants) score?
- Tools linked to GAIA (General AI Assistants) on Sophon include GAIA RL Env (Browserbase) - RL environments, datasets, and scaffolds that target this eval.
- What license is GAIA (General AI Assistants) under?
- GAIA (General AI Assistants) is available under Apache-2.0.