Reddit r/LocalLLaMA·12 June 2026

MiniMax Sparse Attention (MSA)

Signal

Hype

In three linesMiniMax introduces MSA (Sparse Attention), a blockwise sparse attention built on GQA for ultra-long contexts (up to 1M tokens). On a 109B multimodal model, MSA reduces per-token attention compute by 28.4x at 1M context, with 14.2x prefill and 7.6x decoding speedups on H800. Code and MiniMax-M3 model released.

Read source

Your take?

Reasoning Infrastructure Open source Benchmarks

Summary generated by Claude — human-verified

MiniMax Sparse Attention (MSA)

Other angles on this story