"inference falls back to dense attention" for MiniMax M3 - does it mean 428B weights used at each step?
MiniMax M3 on Hugging Face falls back to dense attention as sparse attention is not yet supported. This potentially means all weights (428B) are used at each step, with significant performance impact.