Back to feed
arXiv cs.LG·

EntmaxKV: Support-Aware Decoding for Entmax Attention

Signal
78
Hype
15
In three linesEntmaxKV introduces a sparse decoding framework for entmax attention, exploiting exact zeros produced by entmax versus softmax's dense tails. Combines query-aware page scoring, support-aware candidate selection, and sparse entmax attention. Achieves 3.36× speedup (softmax) and 5.43× (entmax) on 1M context using reduced KV cache fraction.
Read source
Your take?
ReasoningBenchmarksInfrastructurePapers

Summary generated by Claude — human-verified