Back to feed
arXiv cs.CL·

Do No Harm? Hallucination and Actor-Level Abuse in Web-Deployed Medical Large Language Models

Signal
78
Hype
25
In three linesStudy of 6,233 MedGPTs and 10 open-source models deployed on the web. 25-30% show low factual accuracy, 33.6-54.3% violate operational thresholds, 57% of Action-enabled models lack privacy disclosures. Authors introduce MedGPT-HEval for hallucination detection and release HAA-MedGPT, a structured dataset.
Read source
Your take?
AI safetyAlignmentEvalsBenchmarksRegulation

Summary generated by Claude — human-verified