Back to feed
arXiv cs.LG·

Quantized Keys Steal Attention: Bias Correction for KV-Cache Compression in Video Diffusion

Signal
78
Hype
15
In three linesAutoregressive video diffusion models use quantized KV caches to reduce memory, but quantization creates an attention bias (Jensen bias) that degrades quality. Authors propose a per-attention-score correction computed from quantization step sizes, recovering quality lost with INT2 quantization while using 50% less memory than INT4.
Read source
Your take?
Video generationReasoningBenchmarks

Summary generated by Claude — human-verified