Back to feed
arXiv cs.AI·

DashAttention: Differentiable and Adaptive Sparse Hierarchical Attention

Signal
78
Hype
25
In three linesDashAttention introduces a differentiable hierarchical sparse attention method using adaptive α-entmax transformation to select variable numbers of KV blocks. Unlike NSA and InfLLMv2, it maintains full differentiability and achieves 75% sparsity with accuracy comparable to full attention. GPU-aware Triton implementation provides significant speedup.
Read source
Your take?
ReasoningBenchmarksInfrastructure

Summary generated by Claude — human-verified