0

GAIA: A Benchmark for General AI Assistants

Active

Proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency. GAIA questions are conceptually simple for humans yet challenging for most advanced AIs.

Domain
Assistants
License
mit
Published
Oct 2024
Notable for
Benchmark for evaluating Assistants.

Cite

Notes

Only stored in your browser.

Related tools

1
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

Papers

1

FAQ

What is GAIA: A Benchmark for General AI Assistants?
Proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency. GAIA questions are conceptually simple for humans yet challenging for most advanced AIs.
How can a model improve its GAIA: A Benchmark for General AI Assistants score?
Tools linked to GAIA: A Benchmark for General AI Assistants on Sophon include GAIA RL Env (Browserbase) - RL environments, datasets, and scaffolds that target this eval.
What license is GAIA: A Benchmark for General AI Assistants under?
GAIA: A Benchmark for General AI Assistants is available under mit.