Back to feed
Reddit r/LocalLLaMA·

MiniMax Sparse Attention (MSA)

Signal
82
Hype
25
In three linesMiniMax introduces MSA (Sparse Attention), a blockwise sparse attention built on GQA for ultra-long contexts (up to 1M tokens). On a 109B multimodal model, MSA reduces per-token attention compute by 28.4x at 1M context, with 14.2x prefill and 7.6x decoding speedups on H800. Code and MiniMax-M3 model released.
Read source
Your take?
ReasoningInfrastructureOpen sourceBenchmarks

Summary generated by Claude — human-verified