GAIA: A Benchmark for General AI Assistants
Active
Proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency. GAIA questions are conceptually simple for humans yet challenging for most advanced AIs.
- Publisher
- Meta FAIR (Fundamental AI Research)
- Domain
- Assistants
- License
- mit
- Published
- Oct 2024
- Notable for
- Benchmark for evaluating Assistants.
Cite
Notes
Only stored in your browser.
Related tools
1Implementations, trainers, datasets and scaffolds linked to this eval.
Papers
1FAQ
- What is GAIA: A Benchmark for General AI Assistants?
- Proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency. GAIA questions are conceptually simple for humans yet challenging for most advanced AIs.
- How can a model improve its GAIA: A Benchmark for General AI Assistants score?
- Tools linked to GAIA: A Benchmark for General AI Assistants on Sophon include GAIA RL Env (Browserbase) - RL environments, datasets, and scaffolds that target this eval.
- What license is GAIA: A Benchmark for General AI Assistants under?
- GAIA: A Benchmark for General AI Assistants is available under mit.