MedCUA-Bench: A Screenshot-Only Benchmark for Clinical Computer-Use Agents
Signal
82
Hype
15
In three linesMedCUA-Bench is an interactive benchmark for evaluating computer-use agents in clinical interfaces. It covers 18 medical scenarios across 10 domains with authentic interfaces. Best closed-source models reach 54.2% strict success, open-source agents average 2.5%, exposing a major gap with required reliability.Read source
Your take?
Summary generated by Claude — human-verified