0

OfficeQA

OfficeQA is a benchmark for evaluating AI agents on grounded, multi-document reasoning over a large and heterogeneous document corpus. The corpus consists of U.S. Treasury Bulletins spanning nearly 100 years, comprising 89,000 pages and over 26 million numerical values. Office…

Domain
rl-env
License
unknown
Published
Mar 2026

Cite

Notes

Only stored in your browser.

Attribution

Leaderboard scores
OpenReward
Attribution policy →

Top models

2
OfficeQABar chart with 2 bars. Highest value: GPT-5.1 Agent at 43.5.
2 models

FAQ

What is OfficeQA?
OfficeQA is a benchmark for evaluating AI agents on grounded, multi-document reasoning over a large and heterogeneous document corpus. The corpus consists of U.S. Treasury Bulletins spanning nearly 100 years, comprising 89,000 pages and over 26 million numerical values. Office…
What license is OfficeQA under?
OfficeQA is available under unknown.