arXiv cs.CL·19 May 2026

FastOCR: Dynamic Visual Fixation via KV Cache Pruning for Efficient Document Parsing

Signal

Hype

In three linesFastOCR introduces a training-free framework to accelerate OCR on Vision-Language Models by exploiting dynamic visual fixation. Through KV cache pruning, the model reduces visual tokens processed to 5% per decoding step while retaining 98% accuracy on Qwen2.5-VL, achieving 3.0× attention latency reduction.

Read source

Your take?

Vision Reasoning Benchmarks Code generation

Summary generated by Claude — human-verified

FastOCR: Dynamic Visual Fixation via KV Cache Pruning for Efficient Document Parsing

Other angles on this story