Back to feed
arXiv cs.CL·

CHI-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows?

Signal
82
Hype
15
In three linesCHI-Bench evaluates AI agent automation of complex healthcare workflows. Benchmark spans 3 domains (prior authorization, utilization management, care management) with 87 MCP tools and 1,290+ policy documents. Best result: 28% task resolution, 3.8% in single session.
Read source
Your take?
AI AgentsMulti-agentMCPBenchmarksReasoning

Summary generated by Claude — human-verified