OfficeQA
Fresh
OfficeQA is a benchmark for evaluating AI agents on grounded, multi-document reasoning over a large and heterogeneous document corpus. The corpus consists of U.S. Treasury Bulletins spanning nearly 100 years, comprising 89,000 pages and over 26 million numerical values. Office…
- Type
- RL Env
- Publisher
- General Reasoning
- Runtime
ORS- License
- unknown
- Size
- 379 tasks
- Published
- Mar 2026
Cite
Notes
Only stored in your browser.
Public scores on this env
24 vf-eval reports across 2 models