Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation
Signal
72
Hype
18
In three linesVision-OPD introduces regional-to-global self-distillation to improve fine-grained visual understanding in MLLMs. The framework transfers the model's privileged perception on evidence-centered crops to its full-image policy via KL divergence minimization between token distributions. Competitive results on fine-grained visual understanding benchmarks without external models or ground-truth labels.Read source
Your take?
Summary generated by Claude — human-verified