arXiv cs.CL·19 May 2026

Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation

Signal

Hype

In three linesVision-OPD introduces regional-to-global self-distillation to improve fine-grained visual understanding in MLLMs. The framework transfers the model's privileged perception on evidence-centered crops to its full-image policy via token-level KL divergence minimization on on-policy rollouts. Competitive results on fine-grained visual understanding benchmarks without external models or ground-truth labels.

Read source

Your take?

Vision Reinforcement learning Papers

Summary generated by Claude — human-verified

Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation

Other angles on this story