0

OfficeQA

Fresh

OfficeQA is a benchmark for evaluating AI agents on grounded, multi-document reasoning over a large and heterogeneous document corpus. The corpus consists of U.S. Treasury Bulletins spanning nearly 100 years, comprising 89,000 pages and over 26 million numerical values. Office…

Type
RL Env
Runtime
ORS
License
unknown
Size
379 tasks
Published
Mar 2026

Cite

Notes

Only stored in your browser.