Back to feed
arXiv cs.CL·

Reading or Guessing? Visual Grounding Failures of Vision-Language Models for OCR in Ancient Greek Editions

Signal
72
Hype
15
In three linesComparative study of Vision-Language Models versus traditional OCR on low-resource Ancient Greek critical editions. VLMs generate plausible but visually unsupported text, revealing excessive reliance on language priors. Image perturbations and token-level grounding measures show fluent errors persist even without visual signal.
Read source
Your take?
VisionEvalsPapers

Summary generated by Claude — human-verified