Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation
Signal
78
Hype
25
In three linesVision-OPD introduces regional-to-global self-distillation to improve fine-grained visual understanding in MLLMs. The framework transfers the model's privileged perception on evidence-centered crops to its full-image policy via token-level KL divergence minimization on on-policy rollouts. Competitive results on fine-grained visual understanding benchmarks without external models or ground-truth labels.Read source
Your take?
Summary generated by Claude — human-verified