GAIA: A Benchmark for General AI Assistants

Meta/HuggingFace benchmark of 466 real-world assistant questions requiring tools, web browsing, file I/O, and multi-step reasoning.

Open

Publisher: Meta FAIR (Fundamental AI Research)
Year: 2023
Venue: ICLR
ArXiv: arxiv.org/abs/2311.12983
Authors: 6
Hosting: External sourcelicense unknown

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text: arxiv.org/abs/2311.12983
TL;DR: semanticscholar.org/paper/ab8169d6e4dfabfe7c30ebec1bb871bf3e1551cd

Attribution policy →

Introduces 1 artifact - 1 eval

TL;DR

Semantic Scholar

GAIA proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency, and shows that human respondents obtain 92% vs. 15% for GPT-4 equipped with plugins.

Artifacts

Evals

GAIA (General AI Assistants)

Authors

Clémentine Fourrier Craig Swift Grégoire Mialon Thomas Scialom Thomas Wolf Yann LeCun