What capabilities does GAIA (General AI Assistants) test?

GAIA (General AI Assistants) evaluates tool calling, browser use, planning, multi turn dialog.

How can a model improve its GAIA (General AI Assistants) score?

Tools linked to GAIA (General AI Assistants) on Sophon include GAIA RL Env (Browserbase) - RL environments, datasets, and scaffolds that target this eval.

What license is GAIA (General AI Assistants) under?

GAIA (General AI Assistants) is available under Apache-2.0.

GAIA (General AI Assistants)

Active

466 real-world questions requiring tool use, multi-step reasoning, and web browsing - easy for humans (~92%) but hard for AI assistants.

Open

Publisher: Meta FAIR (Fundamental AI Research)
Capabilities: Tool Calling Browser Use Planning Multi Turn Dialog
Domain: agentic
Format: HF Dataset
Size: 466 tasks
License: Apache-2.0
Published: Nov 2023
Updates: Live
Notable for: The reference leaderboard for general-purpose AI assistants — Meta-FAIR's GAIA benchmark with HF as host, used by agent frameworks including AutoGPT, LangChain, Trase, and HuggingFace's smolagents.
Canonical: huggingface.co/gaia-benchmark
Official leaderboard: huggingface.co/spaces/gaia-benchmark/leaderboard
Also on: huggingface.co/datasets/gaia-benchmark/GAIA

Cite

Notes

Only stored in your browser.

Where it's ranked

Official leaderboard

huggingface.co

Single benchmark

live

Related tools

View all

Implementations, trainers, datasets and scaffolds linked to this eval.

GAIA RL Env (Browserbase)

Browserbase

GAIA web browser benchmark for multi-hop question answering with web navigation

ImplementationRL EnvBrowserBrowserbaseGaia

Papers

GAIA: A Benchmark for General AI Assistants

ICLR · 2023

Meta/HuggingFace benchmark of 466 real-world assistant questions requiring tools, web browsing, file I/O, and multi-step reasoning.

introduces

GAIA: A Benchmark for General AI Assistants

ICLR · 2023

Meta/HuggingFace benchmark of 466 real-world assistant questions requiring tools, web browsing, file I/O, and multi-step reasoning.

Contributors

GGrégoire Mialon YYann LeCun TThomas Wolf

FAQ

What is GAIA (General AI Assistants)?: 466 real-world questions requiring tool use, multi-step reasoning, and web browsing - easy for humans (~92%) but hard for AI assistants.
What capabilities does GAIA (General AI Assistants) test?: GAIA (General AI Assistants) evaluates tool calling, browser use, planning, multi turn dialog.
How can a model improve its GAIA (General AI Assistants) score?: Tools linked to GAIA (General AI Assistants) on Sophon include GAIA RL Env (Browserbase) - RL environments, datasets, and scaffolds that target this eval.
What license is GAIA (General AI Assistants) under?: GAIA (General AI Assistants) is available under Apache-2.0.