HEED: Density-Weighted Residual Alignment for Hybrid Vision-Language Model Distillation
HEED introduces density-weighted residual alignment for distilling vision-language models (e.g., Qwen3-VL-8B) into hybrid Mamba-2/attention architectures. The method targets high-density patches (text, fine details) experiencing 3.6× larger residual drift. Results: +8.7 points OCRBench v2, +5.13 points average, 4.12× throughput, 68% memory savings.