arXiv cs.AI·19 May 2026

CHI-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows?

Signal

Hype

In three linesCHI-Bench evaluates AI agents' ability to automate complex healthcare workflows (prior authorization, utilization management, care management) across 87 MCP tools and 20 applications. Best agent resolves only 28% of tasks; none exceed 20% on strict pass. Performance drops to 3.8% in single-session mode.

Read source

Your take?

AI Agents MCP Benchmarks Multi-agent Reasoning

Summary generated by Claude — human-verified

CHI-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows?

Other angles on this story