Back to feed
arXiv cs.CL·

FastOCR: Dynamic Visual Fixation via KV Cache Pruning for Efficient Document Parsing

Signal
78
Hype
15
In three linesFastOCR introduces a training-free framework to accelerate OCR on Vision-Language Models by exploiting dynamic visual fixation. Through KV cache pruning, the model reduces visual tokens processed to 5% per decoding step while retaining 98% accuracy on Qwen2.5-VL, achieving 3.0× attention latency reduction.
Read source
Your take?
VisionReasoningBenchmarksCode generation

Summary generated by Claude — human-verified