Reading or Guessing? Visual Grounding Failures of Vision-Language Models for OCR in Ancient Greek Editions
Signal
72
Hype
15
In three linesComparative study of Vision-Language Models versus traditional OCR on low-resource Ancient Greek critical editions. VLMs generate plausible but visually unsupported text, revealing excessive reliance on language priors. Image perturbations and token-level grounding measures show fluent errors persist even without visual signal.Read source
Your take?
Summary generated by Claude — human-verified