MiniMax Sparse Attention (MSA)
Signal
82
Hype
25
In three linesMiniMax introduces MSA (Sparse Attention), a blockwise sparse attention built on GQA for ultra-long contexts (up to 1M tokens). On a 109B multimodal model, MSA reduces per-token attention compute by 28.4x at 1M context, with 14.2x prefill and 7.6x decoding speedups on H800. Code and MiniMax-M3 model released.Read source
Your take?
Summary generated by Claude — human-verified