arXiv cs.CL·21 May 2026

Do No Harm? Hallucination and Actor-Level Abuse in Web-Deployed Medical Large Language Models

Signal

Hype

In three linesStudy of 6,233 MedGPTs and 10 open-source models deployed on the web. 25-30% show low factual accuracy, 33.6-54.3% violate operational thresholds, 57% of Action-enabled models lack privacy disclosures. Authors introduce MedGPT-HEval for hallucination detection and release HAA-MedGPT, a structured dataset.

Read source

Your take?

AI safety Alignment Evals Benchmarks Regulation

Summary generated by Claude — human-verified

Do No Harm? Hallucination and Actor-Level Abuse in Web-Deployed Medical Large Language Models

Other angles on this story