Back to feed
arXiv cs.CL·

CompactAttention: Accelerating Chunked Prefill with Block-Union KV Selection

Signal
78
Hype
15
In three linesCompactAttention optimizes chunked prefill attention for long-context LLMs using Block-Union KV Selection. The method converts block-sparse masks into GQA-aware per-group KV block tables, eliminating explicit KV compaction. On LLaMA-3.1-8B, it achieves 2.72× speedup at 128K tokens while maintaining accuracy near dense attention (RULER benchmark).
Read source
Your take?
ReasoningBenchmarksInfrastructure

Summary generated by Claude — human-verified