Medical Context Distorts Decisions in Clinical Vision Language Models
Signal
72
Hype
18
In three linesarXiv study identifies three critical failure modes of vision-language models (VLMs) in clinical settings: over-reliance on text vs images, dependence on irrelevant clinical history, prompt sensitivity across semantically equivalent inputs. Testing on MIMIC-CXR shows VLM decisions dominated by text modality even when visual evidence is available.Read source
Your take?
Summary generated by Claude — human-verified