Reddit r/MachineLearning·3 June 2026

MiniMax dropped a new attention architecture. [N]

Signal

Hype

In three linesMiniMax introduces a new attention architecture (MSA) natively supporting 1M tokens without quadratic complexity. 'KV outer gather Q' approach delivers 4× faster than Flash-Sparse-Attention, compute reduced to 1/20th, 9× prefilling speedup, 15× decoding speedup. First open-weight model combining frontier coding, 1M context, and native multimodality.

Read source

Your take?

Reasoning Code generation Vision AI Agents Infrastructure

Summary generated by Claude — human-verified

MiniMax dropped a new attention architecture. [N]

Other angles on this story