Back to feed
Reddit r/MachineLearning·

MiniMax dropped a new attention architecture. [N]

Signal
72
Hype
35
In three linesMiniMax introduces a new attention architecture (MSA) natively supporting 1M tokens without quadratic complexity. 'KV outer gather Q' approach delivers 4× faster than Flash-Sparse-Attention, compute reduced to 1/20th, 9× prefilling speedup, 15× decoding speedup. First open-weight model combining frontier coding, 1M context, and native multimodality.
Read source
Your take?
ReasoningCode generationVisionAI AgentsInfrastructure

Summary generated by Claude — human-verified