EntmaxKV: Support-Aware Decoding for Entmax Attention
Signal
78
Hype
15
In three linesEntmaxKV introduces a sparse decoding framework for entmax attention, exploiting exact zeros produced by entmax versus softmax's dense tails. Combines query-aware page scoring, support-aware candidate selection, and sparse entmax attention. Achieves 3.36× speedup (softmax) and 5.43× (entmax) on 1M context using reduced KV cache fraction.Read source
Your take?
Summary generated by Claude — human-verified