Back to feed
arXiv cs.AI·

Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation

Signal
72
Hype
18
In three linesVision-OPD introduces regional-to-global self-distillation to improve fine-grained visual understanding in MLLMs. The framework transfers the model's privileged perception on evidence-centered crops to its full-image policy via KL divergence minimization between token distributions. Competitive results on fine-grained visual understanding benchmarks without external models or ground-truth labels.
Read source
Your take?
VisionReinforcement learningBenchmarks

Summary generated by Claude — human-verified