End-to-end automation of realistic healthcare operations stresses three capabilities underrepresented in current benchmarks: policy density, decisions must be grounded in a large library of medical, insurance, and operational rules; Multi-role composition: a single task requires the agent to play multiple roles with handoffs; and multilateral interaction: intermediate workflow steps are multi-turn dialogs, such as peer-to-peer review and patient outreach. We introduce χ-Bench, a benchmark of long-horizon healthcare workflows across three domains: provider prior authorization, payer utilization management, and care management. Each task hands the agent a clinical case in a high-fidelity simulator of 20 healthcare apps exposed via 87 MCP tools, which it must drive to a terminal status through tool calls and writing the role's artifacts, guided by a 1,290+ document managed-care operations handbook skill. Across 30 agent harness/models configurations, the best agent resolves only 28.0% of tasks, no agent clears 20% on strict pass^3, and executing all tasks in a single session slumps the performance to 3.8%. These results raise the hypothesis that similar gaps are likely to surface in other policy-dense, role-composed, irreversible enterprise domains.
CHI-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows?
Healthcare workflow benchmark challenges agents with policy-dense, multi-role, and multilateral interaction requirements, revealing significant performance gaps in automated enterprise applications.
- Year
- 2026
- Venue
- arXiv 2026
- Stars
- 37
- Authors
- 33
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2605.16679ARXIV-DEFAULT
- TL;DR
- Semantic Scholar
Abstract
Authors
33Caiming XiongSanmi KoyejoZixian MaChenyu YouQingsong WenPhilip S. YuYuan YuanEric P. XingKun ZhangHaolin ChenWeiran YaoZhiwei LiuYue ZhaoHang JiangFrank WangHua WeiBiwei HuangCarl YangDeon MetelskiLeon QiTao XiaJoonyul LeeSteve BrownKevin RileyT. Y. Alvin LiuHank Capps MDZeyu TangXiangchen SongLingjing KongFan FengTianyi ZengFangli GengYanjie Fu