Same Patient, Different Words, Different Diagnosis? Evaluating Semantic Stability in Clinical LLMs
Signal
78
Hype
15
In three linesEvaluation of semantic stability in 16 LLMs (general-purpose and medical) under clinically equivalent prompt reformulations. Proposes NLI-based verification framework and three sensitivity metrics (MVS, ΔC, WCI). Finding: domain specialization does not consistently improve robustness to meaning-preserving variations.Read source
Your take?
Summary generated by Claude — human-verified