arXiv cs.LG·27 May 2026

Quantized Keys Steal Attention: Bias Correction for KV-Cache Compression in Video Diffusion

Signal

Hype

In three linesAutoregressive video diffusion models use quantized KV caches to reduce memory, but quantization creates an attention bias (Jensen bias) that degrades quality. Authors propose a per-attention-score correction computed from quantization step sizes, recovering quality lost with INT2 quantization while using 50% less memory than INT4.

Read source

Your take?

Video generation Reasoning Benchmarks

Summary generated by Claude — human-verified

Quantized Keys Steal Attention: Bias Correction for KV-Cache Compression in Video Diffusion

Other angles on this story