Back to feed
arXiv cs.AI·

CHI-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows?

Signal
82
Hype
15
In three linesCHI-Bench evaluates AI agents' ability to automate complex healthcare workflows (prior authorization, utilization management, care management) across 87 MCP tools and 20 applications. Best agent resolves only 28% of tasks; none exceed 20% on strict pass. Performance drops to 3.8% in single-session mode.
Read source
Your take?
AI AgentsMCPBenchmarksMulti-agentReasoning

Summary generated by Claude — human-verified