Back to feed
arXiv cs.CL·

Medical Context Distorts Decisions in Clinical Vision Language Models

Signal
72
Hype
18
In three linesarXiv study identifies three critical failure modes of vision-language models (VLMs) in clinical settings: over-reliance on text vs images, dependence on irrelevant clinical history, prompt sensitivity across semantically equivalent inputs. Testing on MIMIC-CXR shows VLM decisions dominated by text modality even when visual evidence is available.
Read source
Your take?
VisionAI safetyEvalsPapers

Summary generated by Claude — human-verified