CompactAttention: Accelerating Chunked Prefill with Block-Union KV Selection
Signal
78
Hype
15
In three linesCompactAttention optimizes chunked prefill attention for long-context LLMs using Block-Union KV Selection. The method converts block-sparse masks into GQA-aware per-group KV block tables, eliminating explicit KV compaction. On LLaMA-3.1-8B, it achieves 2.72× speedup at 128K tokens while maintaining accuracy near dense attention (RULER benchmark).Read source
Your take?
Summary generated by Claude — human-verified