The Agent Company: Evaluating multi-tool autonomous agents in a synthetic company
Active
The Agent Company benchmark evaluates autonomous agents in a realistic, self-contained company environment. Tasks require browsing internal web services, reading and writing files, running code, and coordinating tools to solve multi-step problems.
- Publisher
- Carnegie Mellon University
- Domain
- Assistants
- License
- mit
- Published
- Apr 2026
- Notable for
- Benchmark for evaluating Assistants.
Cite
Notes
Only stored in your browser.
FAQ
- What is The Agent Company: Evaluating multi-tool autonomous agents in a synthetic company?
- The Agent Company benchmark evaluates autonomous agents in a realistic, self-contained company environment. Tasks require browsing internal web services, reading and writing files, running code, and coordinating tools to solve multi-step problems.
- What license is The Agent Company: Evaluating multi-tool autonomous agents in a synthetic company under?
- The Agent Company: Evaluating multi-tool autonomous agents in a synthetic company is available under mit.