arXiv cs.CL·19 May 2026

HEED: Density-Weighted Residual Alignment for Hybrid Vision-Language Model Distillation

Signal

Hype

In three linesHEED introduces density-weighted residual alignment for distilling vision-language models (e.g., Qwen3-VL-8B) into hybrid Mamba-2/attention architectures. The method targets high-density patches (text, fine details) experiencing 3.6× larger residual drift. Results: +8.7 points OCRBench v2, +5.13 points average, 4.12× throughput, 68% memory savings.

Read source

Your take?

Vision Fine-tuning Benchmarks Code generation

Summary generated by Claude — human-verified

HEED: Density-Weighted Residual Alignment for Hybrid Vision-Language Model Distillation

Other angles on this story