arXiv cs.CL·28 May 2026

Reading or Guessing? Visual Grounding Failures of Vision-Language Models for OCR in Ancient Greek Editions

Signal

Hype

In three linesComparative study of Vision-Language Models versus traditional OCR on low-resource Ancient Greek critical editions. VLMs generate plausible but visually unsupported text, revealing excessive reliance on language priors. Image perturbations and token-level grounding measures show fluent errors persist even without visual signal.

Read source

Your take?

Vision Evals Papers

Summary generated by Claude — human-verified

Reading or Guessing? Visual Grounding Failures of Vision-Language Models for OCR in Ancient Greek Editions

Other angles on this story