arXiv cs.LG·22 May 2026

EntmaxKV: Support-Aware Decoding for Entmax Attention

Signal

Hype

In three linesEntmaxKV introduces a sparse decoding framework for entmax attention, exploiting exact zeros produced by entmax versus softmax's dense tails. Combines query-aware page scoring, support-aware candidate selection, and sparse entmax attention. Achieves 3.36× speedup (softmax) and 5.43× (entmax) on 1M context using reduced KV cache fraction.

Read source

Your take?

Reasoning Benchmarks Infrastructure Papers

Summary generated by Claude — human-verified

EntmaxKV: Support-Aware Decoding for Entmax Attention

Other angles on this story