MiniMax dropped a new attention architecture. [N]
MiniMax introduces a new attention architecture (MSA) natively supporting 1M tokens without quadratic complexity. 'KV outer gather Q' approach delivers 4× faster than Flash-Sparse-Attention, compute reduced to 1/20th, 9× prefilling speedup, 15× decoding speedup. First open-weight model combining frontier coding, 1M context, and native multimodality.