0

GAIA: A Benchmark for General AI Assistants

Meta/HuggingFace benchmark of 466 real-world assistant questions requiring tools, web browsing, file I/O, and multi-step reasoning.

Year
2023
Venue
ICLR
Authors
6
Hosting
External sourcelicense unknown

Cite

Notes

Only stored in your browser.

Introduces 1 artifact - 1 eval

TL;DR

Semantic Scholar

GAIA proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency, and shows that human respondents obtain 92% vs. 15% for GPT-4 equipped with plugins.

Artifacts

1

Authors

6