arXiv cs.CL·19 May 2026

CHI-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows?

Signal

Hype

In three linesCHI-Bench evaluates AI agent automation of complex healthcare workflows. Benchmark spans 3 domains (prior authorization, utilization management, care management) with 87 MCP tools and 1,290+ policy documents. Best result: 28% task resolution, 3.8% in single session.

Read source

Your take?

AI Agents Multi-agent MCP Benchmarks Reasoning

Summary generated by Claude — human-verified

CHI-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows?

Other angles on this story