arXiv cs.CL·19 May 2026

CompactAttention: Accelerating Chunked Prefill with Block-Union KV Selection

Signal

Hype

In three linesCompactAttention optimizes chunked prefill attention for long-context LLMs using Block-Union KV Selection. The method converts block-sparse masks into GQA-aware per-group KV block tables, eliminating explicit KV compaction. On LLaMA-3.1-8B, it achieves 2.72× speedup at 128K tokens while maintaining accuracy near dense attention (RULER benchmark).

Read source

Your take?

Reasoning Benchmarks Infrastructure

Summary generated by Claude — human-verified

CompactAttention: Accelerating Chunked Prefill with Block-Union KV Selection

Other angles on this story