GAIA: A Benchmark for General AI Assistants
Meta/HuggingFace benchmark of 466 real-world assistant questions requiring tools, web browsing, file I/O, and multi-step reasoning.
- Publisher
- Meta FAIR (Fundamental AI Research)
- Year
- 2023
- Venue
- ICLR
- Authors
- 6
- Hosting
- External sourcelicense unknown
Cite
Notes
Only stored in your browser.
Introduces 1 artifact - 1 eval
TL;DR
Semantic Scholar
GAIA proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency, and shows that human respondents obtain 92% vs. 15% for GPT-4 equipped with plugins.